Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Precision-based sample size reduction for Bayesian experimentation using Markov chain simulation
(USC Thesis Other)
Precision-based sample size reduction for Bayesian experimentation using Markov chain simulation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
PRECISION-BASED SAMPLE SIZE REDUCTION FOR BAYESIAN EXPERIMENTATION USING MARKOV CHAIN SIMULATION by David J. Huber A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (BIOMEDICAL ENGINEERING) December 2007 Copyright 2007 David J. Huber ii Epigraph “No knowledge can be certain, if it is not based upon mathematics or upon some other knowledge which is itself based upon the mathematical sciences”. - Leonardo da Vinci “Oh, people can come up with statistics to prove anything, Kent. Forty percent of people know that”. - Homer J. Simpson iii Acknowledgements I would like to thank all of the people that made this work possible, whether it was by intellectual, moral, or emotional support. I would especially like to thank the members of my committee for supporting my vision for this work. These people include Alan Schumizky, who provided advice through emails, phone calls, and meetings, and pointed me in the direction of some key papers when ideas were running thin; Stanley Yamashiro, who took on the role of committee chair and provided data and advice; and Jean-Michel Maarek, who also shared data and was a constant source of advice and support throughout all of my many years at U.S.C., including undergraduate. I would also like to thank Jesse Yen and Tzung Hsiai for sitting on my candidacy exam committee and Robert Kalaba for providing advice and direction when I was just starting this work. I also want to thank my family and friends for supporting and encouraging me throughout this process, especially during the times when it looked like I would never finish. I never could have stayed focused at work if I didn’t have wonderful people to spend my free time with. I want to thank my mom for all of the sacrifices she made to ensure that I got to this point, and for teaching me the value of hard work, perseverance, and sometimes, even stubbornness. I would like to thank Rodel and Tamra Cruz-Herrera for loaning me various computers to help with all of the number crunching and so many programming books. If not for all of the extra computing resources, this work would have taken even longer to finish. Thank you for being such good friends. Finally, I would like to thank all of the teachers that I have had over the years that helped me build the broad academic foundation that I drew on constantly as I carried out this work. I am constantly amazed at how things I learned years ago and didn’t give a second thought are such invaluable tools now. And since I probably didn’t thank you back then, I am now… Thanks. iv Table of Contents Epigraph........................................................................................................................................................ ii Acknowledgements...................................................................................................................................... iii List of Tables.................................................................................................................................................vi List of Figures............................................................................................................................................. vii Nomenclature................................................................................................................................................xi Abstract.......................................................................................................................................................xiv Chapter 1 Introduction ..........................................................................................................................1 1.1 Motivation....................................................................................................................................1 1.2 Objectives and Claims of This Research......................................................................................3 1.3 Overview and Structure of Paper .................................................................................................5 Chapter 2 Design of Experiments and Sample Size Determination ...................................................6 2.1 Fundamentals of Experimentation................................................................................................6 2.2 Bayesian Experimentation............................................................................................................9 2.2.1 The Bayesian Framework........................................................................................................9 2.2.2 Defining the Prior Distribution..............................................................................................10 2.2.3 Variance Models....................................................................................................................15 2.3 Optimal Experiment Design.......................................................................................................17 2.3.1 Design Criteria ......................................................................................................................17 2.3.2 Selecting a Global Optimization Algorithm ..........................................................................23 2.4 Experimental Analysis ...............................................................................................................28 2.4.1 Parameter Estimation.............................................................................................................28 2.4.2 Measures of Experimental Precision .....................................................................................30 Chapter 3 Existence of the Optimal Sample Size...............................................................................33 3.1 Information Provided by an Experiment ....................................................................................33 3.2 Diminishing Marginal Utility of Information.............................................................................35 3.3 Prior Art: Determining the Optimal Sample Size.......................................................................38 Chapter 4 Posterior Sampling and Markov Chain Simulation ........................................................43 4.1 Introduction to Markov Processes..............................................................................................43 4.2 Posterior Simulation Using Markov Chains...............................................................................45 4.3 Preposterior Distributions and Posterior Predictive Simulation.................................................48 4.4 Diagnostics to Monitor the Convergence of a Markov Process .................................................53 4.5 Precision-based Sample Size Determination..............................................................................57 4.5.1 Quality Control and Acceptance Sampling ...........................................................................57 4.5.2 Application of Quality Control to Experiment Design..........................................................61 4.5.3 Posterior Predictive Simulation and Virtual Acceptance Sampling ......................................68 4.6 Implementing the Decision Criterion .........................................................................................70 v Chapter 5 Methods of Implementation...............................................................................................74 5.1 Programming and Naming Conventions ....................................................................................74 5.2 Architecture of the Application Programming Interface............................................................76 5.2.1 Vector, Matrix, and Markov Chain Operators.......................................................................78 5.2.2 The Experimental Framework...............................................................................................79 5.2.3 Optimal Design and Sample Size Criteria .............................................................................81 5.3 Other Computational Considerations .........................................................................................82 5.3.1 Domain Constraints...............................................................................................................82 5.3.2 Derivatives and Gradients .....................................................................................................84 5.3.3 Random Number Generation.................................................................................................84 5.3.4 Finding the Mode of a Markov Chain ...................................................................................87 5.4 Software Validation....................................................................................................................89 Chapter 6 Evaluation of the Proposed Algorithm .............................................................................95 6.1 Phases of Evaluation ..................................................................................................................95 6.2 Phase One: Computer Simulation ..............................................................................................96 6.2.1 Method of Evaluation ............................................................................................................96 6.2.2 Experimental Models...........................................................................................................100 6.3 Phase Two: Practical Demonstration........................................................................................109 6.3.1 Method of Evaluation ..........................................................................................................109 6.3.2 Experimental Models...........................................................................................................111 Chapter 7 Results of Evaluation........................................................................................................129 7.1 Results from Simulated Experiments .......................................................................................129 7.1.1 Trials 1:01 through 1:04 – Exponential Decay....................................................................129 7.1.2 Trials 2:01 through 2:04 – Exponential Rise and Fall .........................................................149 7.1.3 Trials 3:01 through 3:04 – Hill Sigmoid..............................................................................166 7.2 Results from Practical Experiments .........................................................................................184 7.2.1 Sallen-Key Low Pass Filter .................................................................................................184 7.2.2 Fluorescence of Indocyanine Green in Blood......................................................................189 7.2.3 Determination of Anaerobic Threshold ...............................................................................196 7.3 Discussion ................................................................................................................................201 7.3.1 Behavior of the Algorithm...................................................................................................201 7.3.2 Sample Size Reduction........................................................................................................207 7.3.3 Computation Time...............................................................................................................209 Chapter 8 Outlook..............................................................................................................................214 8.1 Conclusions..............................................................................................................................214 8.2 Advantages of the Method .......................................................................................................215 8.3 Limitations of the Method........................................................................................................217 8.4 Future Work and Directions of Research .................................................................................218 References ..................................................................................................................................................220 Appendices ......................................................................................................................... 228 Appendix A XRedDesign API Documentation..................................................................................228 Appendix B XRedDesign Software Validation Routines and Results ...............................................279 Appendix C Program Code for this Work..........................................................................................297 vi List of Tables Table 2.01 Test Functions for Evaluating Global Optimization Algorithms..........................................26 Table 5.01 Prefix Notation Used for Variable and Class Types in the XRedDesign API......................76 Table 5.02 Random Number Requirements for Each Algorithm ...........................................................85 Table 5.03 Validation Stages for the XRedDesign API Algorithms ......................................................90 Table 6.01 Framework for Simulated Experiments for First Phase of Evaluation .................................97 Table 6.02 Phase One Experimental Models and Their Applications..................................................101 Table 6.03 Framework for Sallen-Key Low Pass Filter Experiment ...................................................115 Table 6.04 Experimental Trials for Sallen-Key Low Pass Filter..........................................................117 Table 6.05 Experimental Trials for ICG Fluorescence.........................................................................120 Table 6.06 Framework for ICG Fluorescence Experiment...................................................................121 Table 6.07 Framework for the Determination of Anaerobic Threshold ...............................................126 Table 7.01 Optimal Sample Sizes for Trials Using the Exponential Decay Model..............................140 Table 7.02 Optimal Sample Sizes for Trials Using the Exponential Rise and Fall Model...................158 Table 7.03 Optimal Sample Sizes for Trials Using the Hill Sigmoid Model .......................................176 vii List of Figures Figure 2.01 A Grey-Box Experiment ........................................................................................................8 Figure 2.02 Univariate and Bivariate Prior Distribution Types...............................................................11 Figure 2.03 Optimal Designs for Some Optimal Design Criteria............................................................21 Figure 2.04 Parameter Mesh for Computing the Expectation in an Optimal Design ..............................23 Figure 2.05 Evaluation of Some Methods of Global Optimization .........................................................25 Figure 2.06 Flowchart of the Random Creep Optimization Algorithm ...................................................27 Figure 2.07 The Bayesian Credibility Interval ........................................................................................32 Figure 3.01 Redundancy of Information between Independent Experiments..........................................36 Figure 3.02 Experimental Sample Size and Diminishing Marginal Utility .............................................37 Figure 4.01 Flowchart of the Independence Metropolis Hastings Algorithm..........................................47 Figure 4.02 Preposterior Sampling by Transformation and Resampling.................................................51 Figure 4.03 Control Chart for a Random Fabrication Process.................................................................58 Figure 4.04 Sparse Sampling from a Manufacturing Distribution...........................................................60 Figure 4.05 Posterior Precision and the Control Requirement ................................................................63 Figure 4.06 Elliptical Control Regions with Maximum Boundary Points...............................................65 Figure 4.07 Consumer’s Risk from Acceptance Sampling......................................................................69 Figure 4.08 Flowchart of the Proposed Method for Sample Size Determination.............................. 71 Figure 5.01 XRedDesign API Syntax......................................................................................................75 Figure 5.02 Mapping of Object Relationships in the XRedDesign API ..................................................77 Figure 5.03 Domain Constraints for Experimental Models.....................................................................83 Figure 5.04 Posterior Reconstruction from a Markov Chain Using KDE ....................................... 88 Figure 6.01 Phase One of the Evaluation Procedure ...............................................................................99 Figure 6.02 Exponential Decay Response Range over Prior and Error Distributions ...........................103 Figure 6.03 Exponential Rise and Fall Response Range over Prior and Error Distributions ................105 Figure 6.04 Hill Sigmoid Response Range over Prior and Error Distributions.....................................108 viii Figure 6.05 Phase Two of the Evaluation Procedure.............................................................................110 Figure 6.06 Schematic for a Sallen-Key Low Pass Filter......................................................................112 Figure 6.07 Sallen-Key Low Pass Filter Response Range over Prior and Error Distributions..............114 Figure 6.08 Low Pass Filter Circuits Used in This Experiment ............................................................116 Figure 6.09 ICG Fluorescence Response Range over Prior and Error Distributions.............................119 Figure 6.10 Anaerobic Threshold Response Range over Prior and Error Distributions........................123 Figure 6.11 Collection of Ventilation Data for the Determination of Anerobic Threshold ...................125 Figure 7.01 EID-optimal Designs for Trial 1:01 ...................................................................................131 Figure 7.02 EID-optimal Designs for Trial 1:02 ...................................................................................131 Figure 7.03 EID-optimal Designs for Trial 1:03 ...................................................................................132 Figure 7.04 EID-optimal Designs for Trial 1:04 ...................................................................................132 Figure 7.05 MPSRF Posterior Diagnostic for Trial 1:01.......................................................................133 Figure 7.06 MPSRF Posterior Diagnostic for Trial 1:02.......................................................................133 Figure 7.07 MPSRF Posterior Diagnostic for Trial 1:03.......................................................................134 Figure 7.08 MPSRF Posterior Diagnostic for Trial 1:04.......................................................................134 Figure 7.09 Sample Size Determination for Trial 1:01..........................................................................135 Figure 7.10 Sample Size Determination for Trial 1:02..........................................................................136 Figure 7.11 Sample Size Determination for Trial 1:03..........................................................................137 Figure 7.12 Sample Size Determination for Trial 1:04..........................................................................138 Figure 7.13 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:01 ..............................142 Figure 7.14 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:02 ..............................143 Figure 7.15 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:03 ..............................144 Figure 7.16 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:04 ..............................145 Figure 7.17 Experimental Accuracy and Information for Trial 1:01 .....................................................146 Figure 7.18 Experimental Accuracy and Information for Trial 1:02 .....................................................146 Figure 7.19 Experimental Accuracy and Information for Trial 1:03 .....................................................147 Figure 7.20 Experimental Accuracy and Information for Trial 1:04 .....................................................147 ix Figure 7.21 EID-optimal Designs for Trial 2:01 ...................................................................................150 Figure 7.22 EID-optimal Designs for Trial 2:02 ...................................................................................150 Figure 7.23 EID-optimal Designs for Trial 2:03 ...................................................................................151 Figure 7.24 EID-optimal Designs for Trial 2:04 ...................................................................................151 Figure 7.25 MPSRF Posterior Diagnostic for Trial 2:01.......................................................................152 Figure 7.26 MPSRF Posterior Diagnostic for Trial 2:02.......................................................................152 Figure 7.27 MPSRF Posterior Diagnostic for Trial 2:03.......................................................................153 Figure 7.28 MPSRF Posterior Diagnostic for Trial 2:04.......................................................................153 Figure 7.29 Sample Size Determination for Trial 2:01..........................................................................154 Figure 7.30 Sample Size Determination for Trial 2:02..........................................................................155 Figure 7.31 Sample Size Determination for Trial 2:03..........................................................................156 Figure 7.32 Sample Size Determination for Trial 2:04..........................................................................157 Figure 7.33 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:01 ..............................160 Figure 7.34 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:02 ..............................161 Figure 7.35 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:03 ..............................162 Figure 7.36 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:04 ..............................163 Figure 7.37 Experimental Accuracy and Information for Trial 2:01 .....................................................164 Figure 7.38 Experimental Accuracy and Information for Trial 2:02 .....................................................164 Figure 7.39 Experimental Accuracy and Information for Trial 2:03 .....................................................165 Figure 7.40 Experimental Accuracy and Information for Trial 2:04 .....................................................165 Figure 7.41 EID-optimal Designs for Trial 3:01 ...................................................................................167 Figure 7.42 EID-optimal Designs for Trial 3:02 ...................................................................................167 Figure 7.43 EID-optimal Designs for Trial 3:03 ...................................................................................168 Figure 7.44 EID-optimal Designs for Trial 3:04 ...................................................................................168 Figure 7.45 MPSRF Posterior Diagnostic for Trial 3:01.......................................................................169 Figure 7.46 MPSRF Posterior Diagnostic for Trial 3:02.......................................................................169 x Figure 7.47 MPSRF Posterior Diagnostic for Trial 3:03.......................................................................170 Figure 7.48 MPSRF Posterior Diagnostic for Trial 3:04.......................................................................170 Figure 7.49 Sample Size Determination for Trial 3:01..........................................................................171 Figure 7.50 Sample Size Determination for Trial 3:02..........................................................................172 Figure 7.51 Sample Size Determination for Trial 3:03..........................................................................173 Figure 7.52 Sample Size Determination for Trial 3:04..........................................................................174 Figure 7.53 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:01 ..............................178 Figure 7.54 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:02 ..............................179 Figure 7.55 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:03 ..............................180 Figure 7.56 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:04......................... 181 Figure 7.57 Experimental Accuracy and Information for Trial 3:01 .....................................................182 Figure 7.58 Experimental Accuracy and Information for Trial 3:02 .....................................................182 Figure 7.59 Experimental Accuracy and Information for Trial 3:03 .....................................................183 Figure 7.60 Experimental Accuracy and Information for Trial 3:04 .....................................................183 Figure 7.61 MPSRF Posterior Diagnostic for Sallen-Key Low Pass Filter...........................................185 Figure 7.62 Sample Size Determination for Sallen-Key Low Pass Filter..............................................186 Figure 7.63 Comparative Precision between Naïve and Optimal Low Pass Filter Experiments ...........187 Figure 7.64 Original and Reduced Designs for Sallen-Key Low Pass Filter.........................................188 Figure 7.65 MPSRF Posterior Diagnostic for ICG Fluorescence..........................................................190 Figure 7.66 Sample Size Determination for ICG Fluorescence.............................................................192 Figure 7.67 Comparative Precision between Naïve and Optimal ICG Fluorescence Experiments .......193 Figure 7.68 Insensitivity of Model Response to Differences in k 755 Parameter.....................................194 Figure 7.69 Original and Reduced Designs for ICG Fluorescence Experiment ....................................195 Figure 7.70 MPSRF Posterior Diagnostic for Anaerobic Threshold .....................................................196 Figure 7.71 Sample Size Determination for the Determination of Anaerobic Threshold......................198 Figure 7.72 Comparative Precision between Anaerobic Threshold Experiments..................................199 Figure 7.73 Original and Reduced Designs for the Determination of Anaerobic Threshold.................200 xi Nomenclature SYMBOLS FOR EXPERIMENTS ε An Experiment * ε An Optimally-designed Experiment (over the design space) ε + A Traditionally designed (naïve) Experiment () ;, yx α θ Experimental Model () ; gy σ Variance Model α Informative Parameter Vector ˆ α Estimate of the Informative Parameters θ Known (fixed) Parameter Vector σ Variance Parameter Vector x Experiment design / Model Input * x Optimal Experiment Design / Model Input y Model Prediction / Output g Variance of the Experimental Error v Experimental Error z Observation from an Experiment P Number of Informative Parameters in an Experiment M Number of Inputs to an Experiment (dimension) N Number of Observations in an Experiment (sample size) * N Optimal Sample Size N + Traditional (naïve) Sample Size xii SYMBOLS FOR BAYESIAN ANALYSIS AND MARKOV CHAINS () p α Prior Distribution of α () | pz α Posterior Distribution of α () | z α Likelihood Function (estimation/inference format) () | z α Likelihood Function (observation/design format) () pz Preposterior Distribution A Posterior Markov Chain (samples from the posterior distribution) i A i th Link (P-dimensional vector) of Posterior Markov Chain A L Length of a Markov Chain K Number of Parallel Markov Chains in Posterior Predictive Simulation SYMBOLS FOR QUALITY ENGINEERING AND PRECISION () Cz α Control over α for the data set in z β Risk Q Expected Control of an Experiment t p ′ Lot Tolerance Fraction Defective (rejectable quality level) 0 Q Control Requirement for a Random Process 0 β Maximum Allowable Risk for an Experiment R Region of Interest (within quality boundary) ρ Precision Tolerance for Region of Interest i AR ∈ Markov Chain Link A i Falls within Region of Interest, R xiii SYMBOLS FOR PROBABILITY AND STATISTICS μ Mean Vector of a Multivariate Probability Distribution Σ Covariance Matrix of a Multivariate Probability Distribution () , N μ Σ Multivariate Normal (Gaussian) Probability Distribution () , L μ Σ Multivariate Lognormal Probability Distribution () , UAB Multivariate Uniform (Rectangular) Distribution between A and B () p x Probability of an Event x () ~ X px X is a Random Sample from Probability Distribution p { } x α E Expectation of x over the Distribution of α ALGEBRAIC OPERATORS () log x Natural Logarithm of x () exp x Exponential Function of x (i.e., x e ) T X Transpose of a Matrix X 1 X − Inverse of a Matrix X () det X Determinant of a Matrix X () chol X Cholesky Decomposition of a Matrix X such that () chol T XXX = {} 1 is true 0 x x otherwise ⎧ = ⎨ ⎩ I The Indicator Function xiv Abstract The costs of sampling are often quite high in biomedical engineering and medicine, where collecting data is frequently invasive, destructive, or time consuming. This results in experiments that are either sparse or very expensive. Optimal design strategies can help a researcher to make the most of a given number of experimental observations, but neglect the actual problem of sample size determination. For a grey-box experiment with continuous parameter and observation spaces, one must determine how many observations are required in order to ensure precise parameter estimates that resist experimental error and prior uncertainty in the parameter values. This work proposes a novel approach to sample size determination that bridges experimental science with principles of quality engineering and control. A population of parallel Markov chains is simulated from the preposterior distribution to generate posterior predictive distributions for a proposed experiment. This represents a collection of possible posterior distributions for the experiment over the entire observation space. One can compute the estimator precision and determine the optimal sample size as a measure of the probability that the experiment, on the average, will fail to yield a necessary degree of estimator precision. This work evaluates the proposed method by applying it to a combination of simulated and practical experiments that validate the utility of the algorithm and examine its properties under various prior distributions and degrees of experimental error. A specialized software package was created to carry out the computations necessary for precision-based sample size determination. 1 Chapter 1 Introduction 1.1 Motivation The purpose of scientific investigation is to learn about some aspect of nature, both by acquiring brand new knowledge and by refining one’s prior understanding of a natural system with new information. The researcher eventually learns enough about the behavior of a system to predict its response to any given stimulus with considerable accuracy. At this point, one can apply the findings to solve practical problems. The standard process for studying the natural world, called the scientific method, consists of formulating hypotheses as possible explanations of natural phenomena and conducting experiments that test these predictions for accuracy. These experiments involve observing the particular system under a variety of environmental conditions and stimuli in order to obtain information about it. The stimulus used in an experiment can be physical, chemical, or temporal, in which one studies the response of a system over the course of time. The experiment design describes the set of stimuli that a researcher selects from which to make observations, which traditionally are distributed linearly or logarithmically over the entire range of input values. From the information provided by the experiment, one can modify the hypothesis and plan further experiments, make generalizations regarding the system’s behavior, or evaluate possible applications. Therefore, experimentation is the foundation of all scientific pursuit and the source of all practical knowledge. The experiments that researchers conduct in the fields of biomedical engineering, medicine, and the life sciences present their own unique set of challenges to researchers; sources of experimental error are common and prevalent, and the limitations and difficulties of experimentation ensure that data is sparse and every observation is critical (Bekey and Yamashiro, 1976). For example, a subject may not comply with the experimental regimen or the time course of an experiment may be very long. The inputs and outputs of the 2 system are often difficult to isolate and observe. Sometimes observations require destructive or invasive sampling, and at other times, the financial burdens and liabilities of the experiment require that its size remain unusually small relative to analogous work in other fields. To complicate things further, the often- miniscule quantities observed typically interlace with noise from sources such as interference from other biological systems, the hum of fluorescent lighting, or signals from cellular telephones. The nature of the field generates a conflict between the experimental error, which requires ample information to provide confidence in a result, and the inability to make many observations; these opposing problems attempt to draw the number of experimental samples in opposite directions. The investigator must not only maximize the amount of information extracted from each observation, but also know precisely how many observations are required to provide enough information to make astute inferences about the system. A method to determine the minimum number of observations required to make confident assertions about a given experiment would streamline experimentation and dramatically improve the efficiency and time- and cost-effectiveness of experimentation. This would liberate the researcher from unnecessarily large and inefficient designs and free up valuable experimental resources for further scientific pursuit. Researchers have responded to the problem of noise in biomedical experimentation by developing methods of optimal design, which employ measures of prior knowledge such mathematical models and previously obtained data to select a set of observations. An optimal design dictates that the scientist observe the system at locations in the design space that provide enhanced sensitivity to the input stimulus and marginalize the damage caused by experimental error. This ensures that the experiment produces the best possible results for its given sample size. However, even optimal design cannot solve the second problem: determining how many samples an experiment actually requires to provide a result with a given degree of certainty. This mandates that scientists develop additional design methodologies that incorporate the consideration of sample size into the preexisting optimal design framework. The problem of sample size determination (SSD) for a biomedical experiment is not as trivial as it may initially appear. The literature suggests that an experiment use the fewest number of observations necessary 3 for the chosen optimal design criterion to work. For example, a D-optimal design requires at least as many measurements as model parameters that are to be estimated using the experimental data (Atkinson and Donev, 1992). Other optimal design criteria may require additional samples as support points (Chaloner and Verdinelli, 1995). However, due to sources of error and prior uncertainty, actual experimental conditions often require many more observations than the bare minimum to produce accurate inferences. Currently, a researcher who wishes to use optimal experiment design techniques must arbitrarily predetermine a “reasonable” number of observations based on the anticipated conditions (Pronzato and Walter, 1993; Song and Wong, 1998). This requires a great deal of experience and skill, and is often subject to error. Consequently, many researchers eschew optimal design techniques in favor of an excess of evenly spaced measurements, in spite of the extra time, effort, and financial expense that these experiments require. 1.2 Objectives and Claims of This Research This work intends to resolve the conflict in biomedical experimentation between the need for ample information to compensate for the sources of error and the desire to minimize the sample size of the experiment. Such a method would strike a balance between the confidence provided by a large sample size and the reduced cost enjoyed by smaller experiments, filling the current void in experiment design with regard to the number of observations. The ultimate goal of this work is to provide an experimenter with an objective and widely applicable methodology to compute the optimal sample size for a given degree of confidence. This will encourage the use of optimal designs in biomedical experimentation over expensive and clumsy traditional experiment designs. To facilitate the streamlining of the experiment design process and support the proposed SSD algorithm, this work will also make a software package available to the scientific community that can be customized to individual experimental scenarios. This work presents a novel methodology to compute the optimal sample size for an experiment that ensures a specified confidence in the experimental results. The proposed algorithm combines ideas from posterior predictive simulation using Markov chain Monte Carlo with concepts of control and risk taken from quality 4 engineering. It employs a procedure of sequentially increasing experiment designs to find the smallest sample size that satisfies a pair of precision-based decision criteria, which correspond to the control and risk management of the experiment. For each candidate design, the algorithm uses the prior information to compute a set of parallel Markov chains that simulate potential experimental results. When the algorithm determines that the current sample size is large enough to satisfy prerequisite levels of estimator precision over its possible outcomes, the search terminates; this is the optimal sample size for the experiment. An experiment that uses both the optimal design and the optimal sample size successfully balances the dual goals of maximizing the experimental information provided by each observation and minimizing the number of observations, while simultaneously ensuring valid experimental results. To ensure that one can readily apply the algorithm to the maximum range of experiments, it was designed to adhere to three key specifications. Most importantly, it is applicable to continuous observation spaces; prior methods of sample size determination only apply to experiments with binary outcomes, which severely limit their utility. Second, it is purely nonlinear and considers nonlinear experimental and variance models, not simply linear approximations of nonlinear models. This ensures its accuracy across a wider variety of models than if it hinged on linear approximation, which often breaks down. Finally, it avoids any ethical dilemmas by shunning any measure of financial cost of economic utility; the proposed SSD algorithm finds the sample size that provides a specified degree of estimator precision and focuses on the results that will be seen, and not the resources that could be saved. This also allows this work to sidestep the difficult task of assigning an acceptable price or value to a given amount of information. To validate the claims of this work regarding the precision of experimental results, to study the behavior of the proposed method, and to ensure its suitability in the laboratory, this work evaluates the algorithm using a number of simulated and real-world experimental environments in two distinct stages. The first stage uses simulated experimental trials to confirm the primary claim that the reduced sample size can provide the necessary estimator precision and studies the behavior of the SSD algorithm, particularly its response to changes in measurement error and prior uncertainty in the model parameters. The second stage of 5 evaluation applies the proposed SSD algorithm to practical experiments and compares the results provided by the optimally-designed experiment to those obtained using the traditional design. These findings establish the notion that the proposed method for sample size determination is a viable replacement for the traditional design in biomedical experimentation for circumstances when prior information is available to the researcher. 1.3 Overview and Structure of Paper This dissertation is divided into eight chapters. This section, the first, discusses the motivations behind this work and covers the specific objectives and claims of the presented research. The second chapter provides an in-depth explanation of experimentation, including design methods, the Bayesian framing of an experiment, including prior distributions and variance models, and analyzing the data. Chapter three examines the accumulation of experimental information as the sample size increases and reviews the prior art in sample size determination. The fourth chapter focuses on the novelties of this work, including the use of Markov chains to simulate posterior predictive distributions and the application of ideas from quality engineering to determine the optimal sample size. Finally, the chapter describes the proposed decision rule for sample size determination. A software package called XRedDesign was developed in the C++ programming language to carry out each of the steps in this procedure. This allows a researcher to determine the optimal sample size for a wide variety of experimental and variance models. The fifth chapter discusses the critical aspects of this program including its architecture and validation of its algorithms. The sixth chapter summarizes the procedure to evaluate the proposed SSD algorithm and describes the various experimental models that are used. The penultimate chapter includes the results from both stages of the evaluation and includes a discussion of the behaviors, advantages and limitations of the algorithm. Finally, the concluding chapter summarizes this work and recommends some potential directions for future work. 6 Chapter 2 Design of Experiments and Sample Size Determination 2.1 Fundamentals of Experimentation Not all experiments are created equally; depending on the maturity of a given field of research, a given experiment may demonstrate varying degrees of prior information in the form of model equations, physical constraints, and previous experimental results. At other times, a scientist may know very little about the system of interest, and must conduct a black-box experiment, which investigates the relationships between an input stimulus and the system response with little or no understanding of the underlying dynamics. Since black-box experiments cannot be streamlined or optimized, one must employ a very systematic and rigid approach, using a traditional design with a large number of observations. Then the investigator attempts to correlate the system inputs and outputs using nonparametric approaches such as the autoregressive moving average (Box and Jenkins, 1994). Although the correlation of input and output does not imply that one causes the other, relating them in this way allows a scientist to formulate some initial hypotheses whose assumptions provide a degree of prior information that can be used to design subsequent experiments. Although experimentation involves studying the unknown, a wise investigator can often avoid the black- box scenario by using science and mathematics to provide some preliminary information about the system of interest. In particular, one can characterize the relationship between the input and output of a system in terms of an experimental model, which relates a stimulus to its response using mathematical expressions and a collection of variables called parameters. One derives the experimental model function from an understanding of the relationships between various physical and chemical factors that are believed to be at work in the system such as resistance, capacitance, and inductance. The parameters of a given model belong to two classes: informative parameters, which are associated with some degree of uncertainty and must be determined by the experiment, and fixed parameters, which have known values defined by natural laws, determined by repeated prior experimentation, or that describe the system response without 7 stimulation (i.e., the initial condition or baseline measurement). The researcher uses a combination of the experimental model and the data to determine the values of the parameters, which then allow the accurate prediction of the system response to new stimuli. Experiments in which the researcher assumes an experimental model and must determine the parameters are called grey-box experiments. These experiments take advantage of the numerical framework that mathematical modeling provides and reduces the objective of the experiment to estimating the model parameters. If the parameter estimates provide a good fit between the observed data and the experimental model, one can accurately predict the future behavior of the system for arbitrary input stimuli. Optimal designs prove very useful to these experiments, since they identify the parameters accurately and efficiently by tailoring the input stimuli to the specific needs of the experimental model and to the degree of parameter uncertainty. Traditional experiment designs usually fail to incorporate the behavior of the model into their structure and consequently allocate experimental resources in a very inefficient manner, leading to observations that provide little knowledge about the model parameters. While the nonparametric analyses performed on black-box experiments cannot predict the response of a system far away from its data points, the mathematical modeling of a natural system provides the researcher with a continuum of response predictions, including those that are very distant from the actual data points. This approach allows the optimization of an experiment and is well suited for experiments involving sparse data. The block diagram in Figure 2.01 depicts a typical grey-box experiment. The researcher applies a set of M- dimensional input stimuli to the system of interest and records observation data. The value of M depends on the specific experimental model. The experimenter repeats the process until N observations have been made according to the N-by-M design matrix, x, which consists of N, M-dimensional “points” in the design space. Whenever the model requires a single input (i.e. M = 1), x reduces to a vector of length N. The predicted response of the system according to the experimental model is represented as y, while z describes the actual observation made by the researcher. The i th observation in an experiment relates to the corresponding model prediction with informative and uninformative parameter vectors α and θ, respectively, as 8 ( ) ( ) ;, ;, ii ii i zx y x v αθ αθ = +, (2.01) where v i represents the experimental error, which is typically Gaussian white noise with zero mean and a variance described by the variance model, g, with its own parameter vector, σ. This framework represents an observation as a normally distributed random variable of mean y and variance g at the current stimulus, x i , () ( ) ( ) ( ) ~;, ;; ii i zNyx gyx α ασ, (2.02) where an experiment is the realization of independent events corresponding to the N observations. By treating an experiment as a random process, the researcher can exploit certain concepts, such as estimator precision, that do not fit soundly into a strictly deterministic experimental framework. () ;, i yx α θ i y i z ( ) ; i gy σ i x i v , α θ σ Figure 2.01: A Grey-box Experiment. In the grey-box scenario, a given observation is expressed as the sum of an experimental model prediction, y, and an error term, v, whose variance is defined by a variance model. The system accepts three parameter vectors, which correspond to the model informative and fixed parameters (α and θ), and the variance model parameters (σ). The design matrix has N, M-dimensional inputs, which are observed independently. The observation vector, z, has length N. 9 2.2 Bayesian Experimentation 2.2.1 The Bayesian Framework Since each observation in (2.01) is a random variable, there will always be some uncertainty in the parameter estimates; therefore, it is often useful to define the parameters as probability distributions instead of by point estimates. Bayesian experimentation builds on the grey-box framework and describes the process of employing Bayes’ Theorem to update one’s knowledge of the parameter distributions in light of experimental results. A researcher begins with some vague notion of the parameter values; this knowledge can be expressed as a probability density p(α), called the prior distribution. After an experiment has been carried out and the results analyzed, the researcher updates the beliefs about the parameter values, which are expressed as another probability density, p(α|z). This function is known as the posterior distribution because it represents the parameter beliefs after, or posterior, to the experiment. Bayes’ Theorem, defined as () ( ) ( ) ()() | | | zp pz zp d α αα α α αα = ∫ , (2.03) unites these two probability distributions through the likelihood function, ( ) | z α , which uses a series of observations, z, to update one’s beliefs about the parameters. The denominator of this expression reduces to a proportionality constant over α that normalizes the area of the posterior distribution to one. Since (2.03) expresses the parameter values as random variables belonging to the appropriate distribution, the researcher can quantify the degree of uncertainty in the parameter estimates at any time by using the estimator precision and can assess the value of an experiment based on the decrease in uncertainty it generates. The keystone of Bayesian experimentation is the likelihood function, which relates the prior and posterior distributions through the experimental data and provides the means to update one’s beliefs about the parameters. In the grey-box context, the likelihood for an experiment is defined as the coincidence of the N independent observations for a given value of the parameter vector, α. Recalling (2.02), as long as the 10 experimental errors at each observation are normally distributed (i.e., Gaussian white noise), the probability of observing a given measurement, z i , when the system exhibits the parameter vector, α, is defined as () () () () ( ) () () 2 ;, 1 ;exp 2;,; 2;,; ii i i i zyx pz gy x gy x αθ α αθ σ παθσ ⎛⎞ −− ⎜⎟ = ⎜⎟ ⎝⎠ , (2.04) which is simply the normal probability with mean y and variance g. Since basic statistical theory states that the probability of the occurrence of a group of independent events is equal to the product of the individual event probabilities, one can express the likelihood of the data vector, z, as () ( ) 1 |; N i i zpz α α = = ∏ . (2.05) The likelihood function at a parameter vector α after observing z is expressed by combining (2.04) and (2.05). By multiplying the likelihood at each α with its appropriate prior probability, one can compute the posterior probability of that parameter. 2.2.2 Defining the Prior Distribution The selection of the prior distribution is one of the most critical aspects of Bayesian experimentation, since this describes precisely what one knows about the parameters prior to the experiment. In addition to determining the proper mean and covariance for the prior, the researcher must also determine which distribution’s shape best describes the nature of the parameter information. A sequential experiment may use the posterior of the previous trial as the prior for the next; however, one must usually originate a prior distribution based purely on assumption. The simplest way to define the prior distribution is to establish which of the standard probability distributions best reflects the attitude of the prior information, and then determine a mean and covariance that describe the prior information. It is especially important to use a probability density with a multivariate extension for instances where a model contains more than a single informative parameter (Miller, 1975). Figure 2.02 illustrates three common prior distributions used in this work. 11 Figure 2.02: Univariate and Bivariate Prior Distribution Types. Bayesian experimentation commonly employs one of three prior distribution shapes described by its mean vector and covariance matrix. Each exhibits a different a priori understanding of the informative parameters: the normal distribution implies that the parameter value takes on a given value with unbiased uncertainty, the lognormal distribution indicates a constraint to positive values, while the uniform distribution represents a weak belief that the parameter falls between the limits without preference to any specific value. THE NORMAL (GAUSSIAN) DISTRIBUTION The normal prior distribution provides a good description of the uncertainty when the researcher has a general belief that a parameter takes on a belief at a given parameter vector and that the uncertainty is equally distributed to either side of this value. The covariance of the distribution defines the uncertainty about the model parameters; a small covariance indicates a great deal of confidence in the initial guess and 12 provides a sharp peak, while a large covariance indicates a great deal of uncertainty and leads to a low, smooth curve. The P-variate normal distribution is defined as () () () () 1 11 ;, exp 2 2 T P Nx x x μ μμ π − ⎛⎞ Σ= − − Σ − ⎜⎟ ⎝⎠ Σ , (2.06) where μ and Σ represent the mean vector and covariance matrix of the distribution. The normal prior distribution has a few key features. First, it permits the parameter values to take on both positive and negative values, which provides a great deal of flexibility. In addition, the normal prior does not have any boundary conditions, which ensures that it will represent the “true” parameter value of the system with a nonzero probability. Therefore, the normal prior distribution is a good choice in general when the researcher has a good initial guess regarding the parameter values, but little other information. THE LOGNORMAL DISTRIBUTION When the parameter values are constrained to positive values, the normal prior distribution is too vague to be useful. For example, quantities such as impedances and half-lives can only take on positive values. In these instances, one can eliminate a large portion of the domain of the normal prior by only considering the positive values. The lognormal distribution is the exponential of the normal distribution, where the probabilities of the negative values of the normal domain map to between zero and one, and the probabilities of the positive values are mapped between one and infinity. The P-variate lognormal distribution is defined as () () () () () () 1 1 11 ;, exp log log 2 2 T P P i i Lx x u S x u xS μ π − = ⎛⎞ Σ= − − − ⎜⎟ ⎛⎞ ⎝⎠ ⎜⎟ ⎝⎠ ∏ (2.07) where u and S are the normal-space mean and covariance and are defined as () ( ) diag log 2 S u μ =− (2.08) and 13 log 1 T S μ μ ⎛⎞ Σ =+ ⎜⎟ ⎝⎠ . (2.09) The lognormal prior distribution is skewed, meaning that the mean and mode fall at different values, and biased, meaning that the mean does not divide the probability density into equal parts. Its kurtosis is a function of both the mean and covariance of the distribution; densities with low means tend to be more peaked than those with high means. This distribution also has a long tail at the positive end of the parameter space, which must be handled carefully in order to avoid an overly diffuse prior distribution. In general, this prior distribution is a good alternative to the normal prior when one knows that the parameter values must take on positive values, and wishes to ensure that the Bayesian framework reflects this. Most of the experiments studied in this work employ unsigned parameters and utilize the lognormal prior distribution to describe their uncertainties. THE UNIFORM (RECTANGULAR) DISTRIBUTION Occasionally, the researcher can only express a weak belief regarding the parameters for a given experiment: only that the parameter falls between two boundary values that can be determined by mathematical derivation or physical constraints of the experimental model. The uniform prior distribution is a good choice for this degree of parameter uncertainty. Since all values between the endpoints of this distribution are equally likely to occur, this strategy affords the experimenter the ability to claim that the correct parameter value is “in there somewhere” without committing to any single value over another. One can define the P-variate uniform distribution using lower and upper boundary vectors X 0 and X 1 as () () 0, 1, 1, 0, 01 1 1 ;, 0otherwise ii i P ii i Xx X XX UxX X = ⎧ ⎫ << ⎪ ⎪ ⎪ ⎪ − = ⎨ ⎬ ⎪ ⎪ ⎪ ⎪ ⎩⎭ ∏ . (2.10) The mean vector and diagonal covariance matrix for this distribution is ( ) 01 2 X X μ + = (2.11) 14 and () 2 10 1 diag 12 XX ⎧ ⎫ Σ= − ⎨ ⎬ ⎩⎭ . (2.12) While the uniform distribution is the relatively safe choice of a prior distribution, one should still take care when implementing it as a prior distribution; since the probability is zero outside of the boundary conditions, the design and subsequent Bayesian analysis will be useless if the actual parameter value falls beyond these boundaries. One should only use the uniform prior as a last resort, when very little prior information is known about the parameter values and no other distribution can adequately describe the uncertainty. Once the researcher chooses a shape for the prior distribution, the mean and covariance of the density must be determined in such a way that adequately describes the parameter uncertainty. The best way to carry this out is by studying the results from other experiments based on similar systems, but educated guesses about the parameter values are also valuable in extreme cases. The researcher must determine the 95 percent confidence or credibility interval for the parameters; this provides the required range of the parameter values in the prior distribution. For the uniform distribution, this process is straightforward since the boundary endpoints at X 0 and X 1 define a fixed interval. For the normal prior, one takes the mean to be at the center of this range, and since the 95 percent confidence interval is covered by two standard deviations from the mean, an appropriate covariance matrix can be computed by dividing the range by four and squaring this value for each parameter. The off-diagonal elements of the covariance matrix can be included if the researcher finds evidence that the parameters might be correlated. The lognormal distribution requires the most manipulation to determine an appropriate mean and covariance. A reasonable method for determining the lognormal parameters is to convert the data range into normal space by taking the natural logarithm of its endpoints and defining a normal-space mean, u, and a normal-space covariance, S, using the same method as with the normal prior. Then, one can determine the mean and covariance in lognormal space using (2.08) and (2.09). Finally, one should verify the density by integrating the lognormal distribution of (μ, Σ) between the endpoints of the range to ensure that this value is equal to 0.95. 15 2.2.3 Variance Models The grey-box definition in (2.01) attributes the difference between the model prediction, y, and the actual observation, z, to the experimental error, represented by a series of residuals at each observation, v. When the experiment employs an adequate experimental model, these error terms are random and unbiased. The error in an experiment flows from two primary sources: the measurement error is dependent on the magnitude of the observation and results from human error or the imprecision of the measuring device, while the baseline error is the consequence of the experimental environment and is homogeneous throughout the observation space. The variance model, g, quantifies these error sources so that the theoretical probability density of the observation variable in (2.02) matches the behavior of the experimental data as closely as possible. In other words, the variance model predicts the discrepancy between the model prediction and the observed data. A number of variance models exist, and a researcher must take care to choose one that adequately describes the experiment without overestimating the error; assuming too large an error variance dramatically inhibits one’s ability to optimize the experiment, but underestimating the variance of the residuals provides increased vulnerability to noise that can compromise its integrity. THE CONSTANT VARIANCE MODEL The simplest model assumes that the variance is constant across the observation space: ( ) ( ) 2 0 ;, ; gy x σ αθσ = (2.13) This model accounts for only the baseline error and assumes that the measurement error is negligible for the experiment. Under this model, one considers the experimental error independent of the magnitude of the observation - an assumption that might not hold for models with a wide range of response magnitudes. However, this variance model does not affect optimal designs and is analytically friendly, which makes it extremely useful in instances where an investigator requires a simple analytical solution to a design problem. The constant variance model was popular prior to the widespread use of the digital computer; 16 researchers often refer to this model in the early design literature (Box and Lucas, 1959) for its computational and analytical simplicity. THE POWER VARIANCE MODEL More often, one desires a variance model that emphasizes the measurement error, which varies with the observation magnitude; smaller measurements exhibit different error than larger measurements. The power variance model, () () () ( ) 2 2 0 ;, ; ; , gy x σ yx λ αθ σ α θ = (2.14) is often used in the design literature (Box and Hill, 1974; Bezeau and Endreyni, 1986; Khinkis, et al., 2003) to satisfy this requirement. The value of λ typically ranges from zero to one; when λ equals zero, this model reduces to (2.13) and the experimental error has a constant variance at σ 0 . Increasing the value of λ either increases or decreases g depending on whether the magnitude of y is greater or less than one, which leads to inconsistencies in its implementation across different models. Using this model almost always requires a computer to perform the computations; because (2.14) includes the model prediction in its definition, analytical expressions for experiments using this variance model become intractable very quickly for all but the simplest experimental models. THE PARABOLIC (QUADRATIC) VARIANCE MODEL The ultimate variance model, which incorporates error from baseline and measurement sources, is the parabolic (quadratic) variance model, in which one expresses the standard deviation as a straight line with intercept: () () () ( ) 2 10 ;, ; ; , gy x σ yx αθσσ αθ =+ (2.15) This model gets its name from the fact that its plot traces a parabola as a function of y. The σ 1 term of this model accounts for the baseline noise that may occur in the background of an experiment regardless of the magnitude of the observation, while the σ 0 term accounts for the measurement error. By setting either of 17 the variance parameters to zero, the researcher can transform (2.15) into a variance model that is purely baseline, as in (2.13), or purely measurement, as in (2.14). Also, unlike the power variance model, the variance predicted by this model is always proportional to the magnitude of the model prediction. The incorporation of both baseline and measurement error makes this variance model useful in a wide variety of experimental situations, and is the model used for most of this work. 2.3 Optimal Experiment Design Researchers have determined a number of ways to design experiments, with some better than others. A typical experiment design consists of a linearly- or logarithmically-spaced sequence of observations across the observation space. After collecting data at each of these locations, the researcher can use regression techniques to fit the response to a model, or simply “connect the dots” to quantify their pattern. However, the observations from this type of design are likely to overlap the information of one another, or contribute information already provided by the models and prior distributions. This work refers to this type of design as the naïve experiment design, since it is often used by researchers without consideration of the prior information of the experiment. Unfortunately, in spite of other, superior experiment design methods, the naïve method continues to be the gold standard of experiment design. 2.3.1 Design Criteria The naïve design works effectively in the black-box scenario, but one often carries out an experiment under grey-box conditions in which the researcher already has some knowledge about the system of interest, characterized by the experimental model and prior distribution. Optimal experiment design describes the process of combining this prior knowledge with information theory to maximize the amount of information that one learns about the model parameters with a given number of observations. Assuming that the model and prior knowledge accurately describes the system under study, an optimally designed experiment will always provide better inferences about the parameter values than a naïve design of the same size. 18 Researchers have proposed different criteria for optimal design based on various measures of experimental information. The majority of these criteria base themselves on the Fisher information matrix, which expresses the information that one expects to gain from an experiment with design matrix x as 1 T FW F − , (2.16) where W is an N-by-N diagonal matrix whose elements represent the variance at each observation and F is the N-by-P matrix defined as () ( ) () () 11 1 1 ;, ;, ;, ;, P NN P yx yx yx yx αθ αθ α α αθ αθ α α ∂∂ ⎡⎤ ⎢⎥ ∂∂ ⎢⎥ ⎢⎥ ⎢⎥ ∂∂ ⎢⎥ ∂∂ ⎣⎦ . (2.17) Researchers often refer to this as the sensitivity matrix, since it indicates how much individual changes to the parameter values affect the response of the model. The Fisher information matrix is formally defined as the variance of the score function (the gradient of the log-likelihood estimator) of the observations at x (Fisher, 1950; Seber and Wild, 1989), but it effectively represents the information as the balance between how much each observation tells about the parameters and the reliability of each observation due to the experimental error. The D-optimal design (Box and Lucas, 1959; Bezeau and Endreyni, 1986) is the simplest of the Fisher- based design criteria, and minimizes the determinant of the inverse Fisher information matrix at some nominal value for the parameters. This design consists of the design matrix x that globally minimizes the functional value of M, defined as () () 1 1 ; det D T Mx FW F α − = (2.18) over the design space. The value of the parameter vector, α, is the experimenter’s initial guess of the parameter values, and has a profound effect on the resulting design. This presents the experimenter with a paradox; if the initial guess is not accurate, then the experiment will fail to estimate the parameters 19 correctly. However, a good initial guess tends to dispute the need to conduct the experiment at all. This limits the utility of D-optimality somewhat; it is useful for null-hypothesis testing where the researcher tests whether the parameter values agree or disagree with the initial guess, but not in circumstances where the parameters are to be determined from a wide continuum of possible values. The minimum number of observations required for any D-optimal design is always equal to the number of parameters, P, but may be as large as desired. Designs for which the sample size, N, is larger than P will always include replications of the first P design points to support the original design. The requirement that one have a reasonable guess regarding the model parameters before executing an experiment turns out to be a critical downfall of D-optimality. Consequently, researchers have developed additional design criteria in order to alleviate this burden by computing the optimal design over a distribution of parameter values instead of a single value. While these methods are not strictly Bayesian, they do make use of the prior distribution as a measure of the parameter uncertainty to compute the optimal design. The EID-optimality criterion (Walter and Pronzato, 1987) incorporates parameter uncertainty into the design procedure by minimizing the D-optimal criterion over the prior distribution of the parameters: () () () 1 1 ; det EID T Mxp FW F α α − ⎧ ⎫ ⎪ ⎪ = ⎨ ⎬ ⎪ ⎪ ⎩⎭ E . (2.19) Since α is defined on a probability distribution instead of at a point value, this involves computing the expected information over a P-variate density. Unlike D-optimal designs, optimal designs computed from this criterion may either replicate the first P design points or spread out over the design space, depending on the covariance of the prior distribution. This spread behavior is often preferable to repeated measurements. Another method called the ED-optimality criterion employs a similar strategy by maximizing the expected Fisher information over the prior distribution: () ( ) ( ) { } 1 ;det T ED M xp F W F α α − = E (2.20) (Pronzato and Walter, 1985). While these criteria are very similar to one another, they actually produce different optimal designs for the same experiments and prior distributions due to the different effects of the 20 expectation on the operand. Unlike designs computed by the EID criterion, these optimal designs replicate the first P design points. The actual design points produced by an optimal design vary between design criteria, which may also affect the decision to use a given criterion over its competitors. Each of the optimal design criteria generates P unique design points that vary based on the covariance and shape of the prior distribution. As previously mentioned, for sample sizes greater than P, the D-optimal design consists of replications of the original P design points. This behavior also holds for the ED-optimal design. However, the EID-criterion spreads out a unique set of design points for sample sizes that exceed the number of parameters. These additional design points remain close to the original P points, but provide support without strictly replicating the data. However, when the covariance of the prior distribution is very small or the sample size is very large, these designs can exhibit a small degree of replication. This is both understandable and acceptable, since an experiment that uses a small prior distribution is unlikely to require a large sample size, and experiments that require large sample sizes often include a great deal of experimental error, which can be mitigated by replicating observations. The distribution of design points for different optimality criteria is illustrated in Figure 2.03, which shows the optimal designs at N=2P for the exponential rise and fall (Eq 6.02) and sigmoid (Eq 6.03) experimental models using the indicated prior distributions and variance parameters. This work exclusively uses the EID-optimality criterion to generate optimal designs for some key reasons. First, incorporating the prior distribution into the optimal design criterion is critically important to Bayesian experimentation, since one’s initial knowledge of the parameter values is unlikely to be sufficient to use the D-optimal criterion. This is especially true as the number of parameters increases, since this adds dimension to the parameter space and increases the necessary accuracy of the initial guess. Next, this criterion is consistent with the D-optimal criterion terms of its structure; the EID-optimal criterion (2.19) is actually the D-optimal criterion (2.18) taken over the expectation of the prior. This provides a welcome continuity for researchers who are already familiar with D-optimal design; the EID-optimal design can be described as a natural extension of the D-optimal design criterion that incorporates uncertainty into the 21 parameter values. Finally, EID-optimal designs do not replicate their design points. Since this work must apply to the widest possible variety of experimental scenarios, it is essential to avoid excessive replication. For example, experiments that employ a temporal input are incompatible with designs with replicated inputs. Furthermore, a spread out optimal design is more intuitive to the researcher who is accustomed to using the naïve approach than a design that consists entirely of replications of a base set of samples. Figure 2.03: Optimal Designs for Some Optimal Design Criteria. The D-, ED-, and EID-optimal design criteria can yield dramatically different results. Here, the optimal designs for the Rise and Fall (6.04) and Hill Sigmoid (6.07) models are displayed for these three criteria. The D and ED designs replicate the same base set of design points, while the EID design tends to spread its design points, only occasionally replicating in critical regions. This spreading behavior is critical to its practical implementation. 22 The EID- and ED-optimal design criteria account for the uncertainty in the experimental parameters by computing the expectation of the criterion over the prior distribution of the parameters. The Monte Carlo method, which involves drawing a large random sample from the prior distribution, evaluating the function of interest at those parameter values, and then computing the average, is the simplest to implement. However, this method exhibits a critical flaw in the optimization scenario. Since any optimization algorithm achieves its objective by iteratively evaluating the target criterion and modifying the search accordingly, the criterion surface must be consistent between evaluations (i.e., the distribution of α must be stationary). Otherwise, the global optimizer ends up blindly searching for a moving target. When there is more than a single parameter, providing a stationary representation of the prior distribution requires an excessively large random sample. Therefore, rather than use a stochastic method, this work computes the expectation in (2.19) and (2.20) by producing a discrete representation of the prior density with T nodes that remains stationary across multiple evaluations (Figure 2.04). These nodes arrange themselves into a mesh, where each parameter vector, α i , represents a different parameter combination, and has a weight, w i , that corresponds to the discrete probability of α i . One can deterministically compute the expectation over the mesh using the definition of the expectation operator for a discrete probability density: () () {} () {} 1 ;; T ii i fxp w fx α α α = =⋅ ∑ E . (2.21) A stationary distribution built in this way is more reliable and uses fewer samples than a stochastic method. This works well for experimental models with a small number of parameters, since adding dimension to α quickly increases the T required to compute the expectation accurately. However, since the optimal designs used in this work do not employ experimental models with more than three informative parameters, the parameter mesh provides a good deterministic method to compute the value of the expectation operator. 23 Figure 2.04: Parameter Mesh for Computing the Expectation in an Optimal Design. Rather than computing the expectation for an optimal design analytically, the mesh computes a discrete representation of the prior distribution (red circles) at evenly or logarithmically spaced intervals. For each “node” in the mesh, the algorithm computes a weight from the prior pdf and grid volume. This weight corresponds to the discrete probability of the parameter value at the node. As long as the mesh contains a sufficient number of nodes, the expectation is the weighted average of the objective function over the population of nodes. 2.3.2 Selecting a Global Optimization Algorithm The key to determining the optimal design for a given experiment is the ability to compute the global maximum or minimum of the criterion function, depending on how the criterion defines the experimental information. The difference between maximization and minimization is trivial, since the two are computationally equivalent; maximizing a given function is the same as minimizing its opposite (negative) value. Therefore, one can employ the same optimization algorithm for both global maximization and minimization using an appropriately adjusted criterion function. 24 A good global optimization algorithm must adhere to a number of requirements to ensure that the optimal design is correct. First and most critically, it must be able to determine the global optimum of the criterion function with high probability; any local solution is insufficient for the purpose of optimal experiment design. A researcher must also consider the time that the algorithm takes to converge to the optimum. While one cannot afford to sacrifice precision and accuracy for the sake of a shorter optimization time, computing the information of an experiment is often a computationally intensive process. Repeatedly computing the information provided by different experimental designs to determine an optimum can often require an excessive length of time, and many optimal designs may require more than thirty hours of computation time when running on a current high-end personal computer. An investigator must take great care to choose an optimization algorithm that does not waste computational time in the pursuit of a globally optimal design. To determine the best algorithm for computing the optimal design, this work evaluated and compared four different methods of global optimization: random creep (Bekey, et al. 1966; Bekey and Ung, 1974), adaptive random search (Masri et al., 1980; Bekey and Masri. 1983), simulated annealing (Metropolis et al. 1953; Vanderbilt and Louie, 1984), and stochastic gradient (Gelfand and Mitter, 1991). Figure 2.05 records the success rate and average convergence time for each algorithm. The performance of each algorithm was evaluated using a set of six test functions taken from the optimization literature (Table 2.01), by optimizing for one hundred trials using different starting points for each. This work adopts the random creep algorithm as the preferred method of global optimization, since it demonstrates the best balance between high accuracy and computational time. This algorithm was originally developed as a “brute force” method to determine the global minimum of a criterion function in an analog computing environment; however, it translates well into digital computing. The technique, flowcharted in Figure 2.06, adds a sequence of random perturbations to the “best” point encountered so far and evaluates the criterion at each of those points. If the algorithm encounters a more optimal value, it updates the “best” point and restarts the search at the starting search radius. After a certain number of failed 25 attempts, the algorithm widens the search radius by increasing the variance of the random variable. This process concludes when the search radius exceeds a predetermined value. To improve the accuracy, this work modifies the original algorithm to include an additional search that uses decreasing random perturbations to converge on the precise global minimum when the outward search fails to find a better point. While this may appear cumbersome and computationally expensive at first glance, the additional accuracy and precision more than compensate for the added time and evaluations of the objective function. Figure 2.05: Evaluation of Some Methods of Global Optimization. This chart indicates the average time and number of function evaluations required to compute the optimum and percent of failed searches for the Adaptive Random Search (ARS), Random Creep (RC), Stochastic Gradient (SG), Simulated Annealing (SA), and Nelder-Mead Simplex (NM) algorithms for the test cases in Table 2.01. The average time is expressed in minutes. This example shows that the Random Creep algorithm provides the best combination of accurate results and shortest computation time. 26 Test Case 3: Goldstein-Price Function; M = 2 Function Definition Maxima Value (0, -1) 3 4 local () ( ) ( ) () () () ( ) () () 22 12 1 1 2 2 22 1112122 22 2112122 11 3023 19 14 3 14 6 3 18 32 12 48 36 27 Fxx f xxf fxxxxxx fxxxxxx =+ + + ⋅ + − ⋅ =− + − + + =− + + − + xx x x x Test Case 6: Multipeak; M = 2 Function Definition Maxima Value (3.92, 3.98) -2.145 9 local () () 1 10 2 1 ii i Fac − = =− − + ∑ xx Test Case 5: Hosaki Function; M = 2 Function Definition Maxima Value (4, 2) -2.345 (1, 2) -1.127 () () () 22 3 4 21 1 1 2 7 1 18 exp 34 F xx x x x =− − + − x Test Case 2: 4-d Powell Function; M = 4 Function Definition Maxima Value (0, 0, 0, 0) 0 () ( ) ( ) ( ) ( ) 22 2 4 12 3 4 2 3 1 4 10 5 2 10 Fx x xx x x xx =+ + − + − + − x Test Case 1: Rosenbrock Banana; M = 2 Function Definintion Maxima Value (1,1) 0 () ( ) () 2 2 2 21 1 100 1 Fxx x =− +− x Test Case 4: Three-hump Camel Back; M = 2 Function Definition Maxima Value (0, 0) 0 (1.75, 0.87) 0.308 (-1.75, -0.87) 0.308 () 24 6 2 11 1122 2 1.05 0.167 Fx x xxxx =− + − + x i 123456 789 10 a i,1 4.0 2.5 7.5 8.0 2.0 2.0 4.5 8.0 9.5 5.0 a i,2 4.0 3.8 5.6 8.0 1.0 8.5 9.5 1.0 3.7 0.3 c i 0.70 0.73 0.76 0.79 0.82 0.85 0.88 0.91 0.94 0.97 Table 2.01: Test Functions for Evaluating Global Optimization Algorithms. This work employed these six test functions while evaluating the different global optimization algorithms. Each function has a globally optimal solution (indicated in black), while some have both global and local solutions (indicated in blue). The value of M indicates the dimension of the input vector. Plots for many of these test functions are illustrated in Appendix B. 27 END 2 BEGIN vXopt = vX 0 dYopt = F(vX 0 ) nIter = 1 bInside = false nFail = 0 vXi = vXopt + vRandN * dStep dYi = F(vXi) vXopt = vXi, dYopt = dYi dStep = 1.0, nFail = 0 dStep = dStep * X S bInside = true dStep = 1.0 Print Warning Message nIter <= MAXIT bInside == false dYi < dYopt nFail < M dStep >= S MAX Outer Random Walk True True True True ++nFail False False True False False False ++nIter dStep = dStep / X S vXi = vXopt + vRandN * dStep dYi = F(vXi) nFail = 0 nFail < M dYi < dYopt vXopt = vXi, dYopt = dYi dStep = 1.0, nFail = 0 dStep <= tolx END 1 Inner Random Walk False True True True False False ++nFail Figure 2.06: Flowchart of the Random Creep Optimization Algorithm. The algorithm consists of two stages. The first searches a region of increasingly large radii to a maximum value, while the second searches a region of decreasingly small radii to a minimum tolerance value. If a “better” value is found at some point, the algorithm moves the search to that point and resets the search radius to the previous value. 28 2.4 Experimental Analysis 2.4.1 Parameter Estimation Once the researcher conducts the experiment, one must estimate the parameters of the experimental model using the observed data. The purpose of the estimation procedure is to determine those parameter values that provide the “best fit” between the model prediction and the observed data according to (2.01). The simplest and most straightforward technique is least-squares estimation, which determines the parameter vector that minimizes the total squared distance between the model prediction and the corresponding data over all N observations; this estimator does not use a variance model and assumes that the variance of the experimental error is constant throughout. However, this does not usually apply to instances of experimental analysis, where the experimental error variance may depend on the magnitude of the measurement or on the experimental apparatus. In these cases, one must employ an estimator that utilizes the variance model to model the experimental error. Many consider maximum likelihood estimation (MLE) as the gold standard for model parameter estimation due to its generality and relative ease of application (Bard, 1974) and the accuracy of its estimates (Hoel, 1954), which, given a set of observations, attempts to determine the parameter values that are most likely within the confines of the experimental and variance models. The maximum likelihood estimate of the observation set z is the parameter vector that maximizes the likelihood function, defined in (2.05). When the experiment employs the constant variance model, maximum likelihood estimation reduces to a form analogous to the least-squares estimator. Sometimes, rather than maximize the likelihood directly, one might instead choose to minimize the negative log-likelihood (O NLL ), which simplifies the problem computationally by changing multiplications to addition operations without changing the estimates computed. As the name suggests, this modification computes the negative logarithm of the likelihood function, rather than the likelihood function explicitly. Since the logarithm of a product is equal to the sum of logarithms, (2.05) transforms into () {} () {} 1 log | log ; N NLL i i Oz pz α α = =− =− ∑ . (2.21) 29 By combining this result with (2.04), the negative log of the likelihood score for a given set of observations becomes () () ( ) () () () 2 1 ; 1 log 2 log ; , 22 ;, N ii NLL i i i zyx N Ogx gx α πασ ασ = ⎧ ⎫ − ⎪ ⎪ =+ + ⎨ ⎬ ⎪ ⎪ ⎩⎭ ∑ . (2.22) Minimizing this score over the parameter vector is equivalent to maximizing the likelihood. One can estimate the model parameters using the Bayesian framework by using the mode of the posterior distribution as a point estimate, instead of the mode of the likelihood function. This method, called maximum a posteriori probability (MAP), allows the researcher to incorporate the prior distribution in addition to the likelihood function into the inference process. Equation (2.03) shows that the posterior distribution is proportional to the product of the prior distribution and the likelihood; therefore, the MAP algorithm plays out much like the MLE method, where the value of (2.05) is multiplied to the appropriate value of the prior. One can determine the parameter estimate by maximizing this value. If the prior distribution of an experiment is uniformly distributed, then the MAP and maximum likelihood estimation procedures are equivalent, with the parameter estimates bounded by the endpoints of the uniform distribution. Alternative, numerical methods of computing the MAP estimate of the parameters have been developed, particularly methods that employ Markov Chain Monte Carlo (MCMC) (Kang, 2000). Choosing an algorithm to minimize the log-likelihood or MAP score is critically important to parameter estimation. A local minimizer will typically suffice, especially if the search is constrained to positive parameter values; this alleviates the need for a more demanding global optimizer. To this end, the Nelder- Mead simplex (Nelder and Mead, 1965) remains the most trusted and widely used method for local, multi- variable optimization. It has even been implemented by the MATLAB software package (Math Works, 1994) as the default optimization function “fminsearch”, which employs a modification to the basic routine (Lagarias et al., 1998) that further improves the rate of convergence of the algorithm. This algorithm constructs a (P+1)-by-P dimensional simplex at and around a starting point of P parameters, and a separate 30 vector of length (P+1) whose elements correspond to the value of the criterion at each vertex of the simplex. Based on the values at its vertices, the algorithm reflects, expands, and contracts the simplex about its center until it converges to a single P-dimensional point at the optimal set of parameters. Nelder-Mead can perform the optimizations required for parameter estimation with great speed, which reduces the computation time of estimates to far less than a global optimizer would require. 2.4.2 Measures of Experimental Precision Due to experimental error, practical data will never exactly match the model prediction for any set of parameter values. Therefore, it is often useful to support the point estimate with the description of an interval in which the parameter occurs with a given probability (e.g., a ninety-five percent chance that the parameter lies between A and B), rather than relying on a single value to represent each parameter. The experimental precision describes the size of this interval and determines the reliability of the estimate; the better the parameterized model fits the data, the narrower this interval and the more reliable the parameter estimates. Therefore, high experimental precision is the hallmark of a well-conducted experiment because it indicates that both the experimental model and its parameter estimates are very likely to be accurate. A good fit between the model and data assures the researcher that the experiment was successful and verifies the utility of its results, while a poor fit indicates a corrupted data set or an incorrect model that must be returned to the hypothesis and design stage. While the maximum likelihood estimator computes point estimates of the parameters given a set of observations, the actual likelihood function represents a distribution that researchers have used to determine methods to describe the uncertainty in parameter estimates computed by MLE. These confidence intervals describe the range of parameter values that are likely to occur with some probability, given the data, the model, and the nature of the experimental error. The definition of the confidence interval is derived from the notion that the likelihood is asymptotically unbiased with a covariance described by the inverse Fisher information matrix, and that the probability under the likelihood function can be expressed as a given 31 number of standard deviations from the mean. The confidence interval for the MLE estimate, ˆ α , for the i th parameter at significance level s is defined as () () 1 1 , ˆˆ 1; 2 T ii ii s CI F W F t N P αα − − ⎛⎞ =± ⋅ − − ⎜⎟ ⎝⎠ , (2.23) where F T W -1 F is the Fisher information matrix defined in (2.16) and t represents Student’s t-distribution at significance level s/2 for N minus P degrees of freedom. Since the maximum likelihood estimator is unbiased, the confidence interval extends an equal distance to either side of the point estimate. In the perfect absence of experimental error, the covariance equals zero and the confidence interval exists only at the point estimate. For simplicity, researchers often consider only the 95% confidence interval and represent a parameter estimate as the MLE point estimate plus or minus the 95% confidence interval. The derivation of the MLE confidence interval demonstrates a key flaw, however. The asymptotic assumption only holds for an infinitely large sample size, and provides an only-approximate description of the likelihood function for a large, finite number of observations. For sparse data, the actual shape of the likelihood function may vary considerably, and it is unlikely that the Fisher information matrix adequately describes the estimator covariance for small sample sizes. Therefore, the description of the confidence interval in (2.23) might not adequately describe the estimator precision. One should take care when using the MLE confidence interval to describe the results of a sparse-data experiment. Since Bayesian experimentation explicitly expresses the model parameters as random variables described by the prior and posterior distributions, a researcher can accurately determine the precision of the MAP estimate regardless of the sample size, even for the sparse-data scenario. The credibility interval of the posterior distribution describes the range of parameter values that covers a given amount of posterior area, directly corresponding to a specific probability, rather than simply estimating it based on statistical assumptions. Figure 2.07 illustrates the Bayesian credibility interval for a given posterior distribution, which for significance level, s, is defined as the interval between the (1-s)/2 and s/2 percentiles of the posterior distribution: 32 () () () 11 1 22 ˆ , i ss CI P P α −− − ⎛⎞ = ⎜⎟ ⎝⎠ . (2.24) For example, the ninety-five percent credibility interval for a given posterior estimate lies between the 0.025 th and 0.975 th percentiles of the posterior distribution. Unlike the confidence interval, the Bayesian credibility interval does not incorporate the point estimate in its computation; this is because the MAP estimator is biased, and the credibility intervals do not center on the posterior mode. By computing the interval directly from the posterior distribution and not making any statistical assumptions about its shape, the Bayesian credibility interval avoids the pitfalls exhibited by the MLE confidence interval, and is the ideal definition of the experimental precision for the sparse data scenario. Figure 2.07: The Bayesian Credibility Interval. Unlike the confidence interval, the credibility interval (grey area) is defined based on the actual posterior distribution and applies to sparse-data problems. One computes the credibility boundaries from the percentiles that yield distribution tails (red areas) of the appropriate size based on the (1-s)/2 significance values. For example, the tails of the 95% credibility interval each contain 2.5% of the posterior probability. Because this method is biased, the posterior mode (black dotted line) does not bisect the lower and upper credibility boundaries (red dotted lines), as with the confidence interval. 33 Chapter 3 Existence of the Optimal Sample Size 3.1 Information Provided by an Experiment Intuition states that an optimal sample size exists for a given experiment. There are certainly sample sizes that are too small, since one cannot trust one or two samples to provide consistently accurate results for an experiment. There are also sample sizes that are too large; thousands or millions of samples provide such a high degree of resolution that one could literally “connect the dots” and eliminate the need for an experimental model, Bayesian framework, or any other type of prior information. The existence of these extremes implies that an intermediate sample size that exhibits the benefits of both large and small sample sizes also exists. However, attempting to guess this value, as is done typically, is not a good way to determine the number of observations needed for an experiment. The information that one learns from an experiment provides the foundation of sample size determination, since this directly affects the accuracy and precision of the parameter estimates. Therefore, the ability to compute the information provided by an experiment and an understanding of how this information accumulates as the sample size increases is critical to determining which samples contribute the experimental result and which samples do not. Shannon (1948) provides one of the earliest attempts to quantify information by studying the transmission of messages over various channels of communication. He determines that the amount of information carried by a signal is characterized by its entropy, H, which he defines as the logarithmic measure () 1 log N ii i HK p p = =− ∑ , (3.01) for a signal with N unique symbols or states, where K is some positive constant and p i represents the probability of the i th symbol to occur in the message. According to this, a signal consisting of a highly predictable sequence carries very little information, while the most information is carried by a signal that employs a large alphabet with equally likely state probabilities. This implies that when an improbable or 34 unlikely state occurs in a message, the resulting reduction of entropy yields a large amount of information, while events that are very likely to occur provide little information. For example, if the first letter in an English word is a “Q”, this clue provides a great deal of information since this letter has a low likelihood of occurrence in the first position of a word. However, the letter “U” in the second position after the “Q” provides no information at all, since it has a 1.0 probability of occurrence. Viewing the information provided by an event as the reduction of entropy in the system provides the cornerstone for the application of information theory to experiment design. Kullback and Leibler (1951) expand upon the work of Shannon and attempt to quantify the signal loss caused by the transmission of a message. The work describes the continuum of message states as continuous probability distributions, and expresses the entropy in terms of the expected value of the log of the probability distribution: ( ) ( ) ( ) ( ) ( ) () () {} log log x I px px px dx px =− =− ∫ E . (3.02) It computes the entropy for the sent and received messages and defines the information lost via transmission as the difference between these entropies. This quantity is known as the Kullback-Leibler divergence between the messages. Dennis Lindley applied the general concept of information to experiment design by combining Bayesian statistical theory with the concepts established by Shannon, and Kullback, and Leibler (Lindley, 1956). He recognized the analogous relationship between transmitted and received messages in communication theory and the knowledge of a state of nature before and after conducting an experiment; he described this knowledge as a function of the prior and posterior parameter distributions. In this case, experimentation is equivalent to the transmission of a signal, except that experimentation results in a gain of information, whereas it is lost through transmission. Therefore, Lindley quantified the information one expects to gain 35 from the experiment, ε, consisting of an arbitrary number of observations with the prior distribution, p(α), as the reduction of entropy, () () () () | ;log z pz Ip p α α εα α ⎧ ⎫ ⎧ ⎫ ⎛⎞ ⎪⎪⎪⎪ == ⎜⎟ ⎨⎨⎬⎬ ⎜⎟ ⎪ ⎪ ⎝⎠ ⎪ ⎪ ⎩⎭ ⎩⎭ EE , (3.03) that results from conducting an experiment. Since information is gained from the experiment and lost from transmission, Lindley reversed the signs between the communication and experimental information scenarios. This expression is the expectation of the Kullback-Leibler divergence between the posterior and prior distributions over the observation space, and illustrates that an informative experiment causes a large “jump” from prior to posterior, in which the posterior precision increases because of the experimental data. 3.2 Diminishing Marginal Utility of Information Lindley related the concept of experimental information to the work of Blackwell (1953), who determined measures of equivalency between experiments, and derived a series of theorems that describe how pooling the information from multiple independent experiments affects the total information. First, on average, any experiment, ε, is informative as long as the density of the observation, p(z), varies with the parameter value, α, as in (2.01), when the observation can be expressed in terms of an experimental model. In other words, ( ) 0 I ε ≥. (3.04) Note that this refers to the average experimental information, and any given experiment might still generate negative information if the results are unexpected or contrary to the assumed model, such as an experiment that only observes outliers or employs an incorrect model. The second theorem characterizes the information redundancy from a pair of independent experiments of the same system, ε 1 and ε 2 , which share the same prior distribution and have no knowledge of each other before their execution. The pooled information between these experiments is described by ( ) ( ) ( ) ( ) 12 121 | II II ε εε εε +≥ + (3.05) or more simply as 36 ( ) ( ) 21 2 |II εεε ≤. (3.06) In other words, an experiment conducted in tandem with another will always yield less new information than if it conducted by itself. This is because some of the information gained from ε 2 needlessly replicates that of ε 1 (Figure 3.01). As one increases the number of pooled experiments, each contributes less new information to the total; eventually, later experiments largely reproduce the information obtained from previous experiments and contribute little to the result. The final theorem characterizes this phenomenon: 11 0 NN N N j jj j + − ≤ −≤ − (3.07) where j N represents the expected information gained from N independent experiments. This states that the amount of new information that the (N+1) th experiment provides is always less than that of the previous (the N th ) experiment. This phenomenon, referred to by economists as diminishing marginal utility, indicates that at some point, conducting additional experiments is futile since they provide negligible new information. Mathematically speaking, the total information from multiple experiments always increases with addition of experiments but concaves downward as its derivative converges to zero (Figure 3.02). 3 1 2 Figure 3.01: Redundancy of Information between Independent Experiments. Independent experiments conducted on the same system replicate each other’s information with varying degrees of overlap. As the number of experiments increases, each contributes less new information to the whole. Eventually, additional experiments contribute negligible new information and add little value to the experimental result. 37 These theorems make no specific claims on the size of any given experiment, only that the researcher performs them independently: using the same prior distribution and without knowledge the outcomes of the other experiments. In this sense, a series of N independent experiments consisting of a single observation is equivalent to a single experiment consisting of N independent observations. Therefore, Lindley’s discussion of the information gained from multiple independent experiments directly relates to the problem of sample size. As an experiment increases the number of observations, the information it yields also increases; however, each subsequent observation provides less new information than the previous one and the information curve concaves downward, just as with multiple experiments. The key to efficient experimentation is the determination of the optimal sample size, in which each observation provides meaningful information that quantifiably affects the experimental results. Figure 3.02: Experimental Sample Size and Diminishing Marginal Utility. As the sample size increases, each new sample provides less new information than the previous did, which causes the curve that describes the total information to concave downward as the sample size increases. This indicates that at some point, additional sampling serves little purpose since these measurements provide negligible new information. The optimal sample size falls in this region of diminished marginal utility. 38 3.3 Prior Art: Determining the Optimal Sample Size The determination of the optimal sample size of an experiment has long been of interest to researchers, but computational considerations have limited its implementation to simple experiments, such as binary and hypothesis-based experimentation and control. However, demand for efficiency and recent increases in computational power have revived interest in methods of optimal sample size determination; one can design experiments that consume less time and funding and provide a greater degree of safety to the experimental subjects without resorting to dramatic and somewhat draconian cost-cutting measures. This section reviews some prior art in this field, including the primary stages in the evolution of the theory of sample size determination relevant to the proposed algorithm. For a more complete review of sample size determination, the literature contains a number of detailed reviews on the subject (Adcock, 1997a; Chaloner and Larntz, 1989; Pezeshk, 2003). The most fundamental procedure for sample size determination employs sequential sampling, in which the researcher decides whether to terminate the experiment or to continue sampling after analyzing each individual observation. One bases the decision to terminate based on the risks, costs, and objectives of the experiment, and balances the costs of additional sampling with the consequences of experimental error. These decision functions can be either Bayesian (Lindley and Barnett, 1965; Berger, 1980), or frequentist (non-Bayesian) (Bellman et al., 1961; Wald, 1950), but the basic goal of balancing the costs, risks, and consequences of the experiment remains consistent. However, Bayesian methods are particularly useful over their frequentist counterparts since they incorporate prior uncertainty into the design and employ a structured framework for quantities such as the experimental utility, loss, and risk. Quality assessments of industrial processes and products commonly use sequential sampling methods because simple analysis ensures that intermediate data is readily available after each observation. However, this approach is both inadequate and impractical for general experimentation, especially in the biomedical field, where data analysis requires more time to carry out, and the need to arrange funding and research facilities requires that the sample size be determined entirely prior to the experiment. 39 The practical barriers associated with sequential sampling have motivated Bayesian techniques that eliminate the dependence on intermediate data, instead making sample size decisions from the prior distribution and the average predicted losses and risks. These methods are collectively known as preposterior analysis because the researcher makes the decision about the sample size before collecting data and computing the posterior distribution. Preposterior methods have been particularly constructive to the design of pharmaceutical clinical trials, hypothesis tests, and other so-called binary experiments, where a given observation takes on one of two discrete outcomes: either a “success” or a “failure”. In this case, one can express the likelihood function as a simple density, such as the binomial or Bernoulli distribution (Johnson, 1994), and the prior takes the form of a pair of weighted Kronecker delta functions. The posterior, which is proportional to the product of the prior and likelihood function, also takes the form of a discrete distribution. The goal of the experiment is to determine the probability of success. The simplicity of this experimental framework affords the researcher a great deal of latitude to deal with conditions and obstacles specific to the particular experiment. Consequently, researchers and mathematicians have paid a good deal of attention developing methods to determine the optimal sample size for binary experiments such as pharmaceutical clinical trials and various mortality and lethality studies. Bayesian preposterior analysis has been invaluable to the design of binary experiments, which are often designed to address specific conditions and obstacles of the particular experiment. Achar (1984) explores designs of multiple-stage clinical trials to determine median survival time with partially censored data. The work chooses a Weibull distribution to represent the likelihood for a fixed initial number of subjects and uses the information extracted from the current stage of the experiment to decide whether to add more subjects to the next stage of the study or to abandon the experiment. The researcher can repeat this process for an arbitrary number of experimental stages until a confident inference can be made. Brooks (1987) considers a similar binary problem that evaluates possible experiments and sample sizes from the expected gain in information from each proposed experiment. That work chooses the smallest sample size that reduces the expected standard error below a minimum acceptable value. Later methods fully incorporate the Bayesian risk to determine the optimal sample size of a binary experiment (Sylvester, 1988). Inferences 40 about the effectiveness of a drug are made using the fewest possible number of subjects, preserving the integrity of the experiment while minimizing potential harmful exposure to the drug. One derives the risk function for this experiment as a function of the sample size by considering the financial and ethical costs, number of subjects, et cetera. The optimal sample size minimizes the average risk over the prior distribution. These techniques illustrate how the optimal sample size can be computed for a given binary experiment using Bayesian methods without the benefit intermediate data, and pave the way for more generally applicable methods of sample size determination. The earliest attempts to determine the optimal sample size are tailored very specifically to the goal of the experiment, and are not widely applicable to other experiments. This strict limitation inspired the development of generalized SSD criteria for binary experimentation. A primary goal of sample size determination has consistently been the generation of inferences that are resistant to experimental error; DasGupta and Mukhopadhyay (1994) proposes a method to determine the smallest sample size that provides a certain degree of posterior robustness, minimizing the effects of experimental error and subsequently reducing the posterior risk. The optimal sample size corresponds to the smallest N that satisfies the expression ( ) ( ) sup | , inf | , p zN p z N c αα − ≤ (3.08) where c takes on some predetermined value corresponding to the allowable posterior width. This ensures that the range of the posterior distribution varies by no more than c, independent of the actual observations, and protects the experiment from unwanted noise. However, this method has the drawback of being unable to deal with multimodal posterior distributions, and is completely defeated by a counterexample in which the posterior has one tall, narrow peak and a shorter, broader peak. By focusing on reducing the absolute error of the experiment instead of costs and risks, which may be abstract and difficult to determine precisely, researchers developed a new way to look at sample size determination. Some of these methods use the statistical concept of posterior tolerance regions (Fraser and Guttman, 1956) to determine the minimum sample size that aims to achieve some degree of parameter 41 uncertainty in the posterior distribution. Joseph et al. (1995) considers the highest posterior density (HPD) interval as the basis of a pair of criteria that aims for at most a given degree of posterior uncertainty. The HPD interval is a well-defined region of length l within the parameter space that contains the maximum amount of posterior area. The authors define the first SSD rule, the average coverage criterion (ACC), as () () ( ) |1 al a pzd pzdz αασ + ⎧⎫ ≥− ⎨⎬ ⎩⎭ ∫∫ (3.09) where σ is a predetermined value corresponding to the maximum allowable imprecision, and the highest posterior density region of length l is defined on the closed interval [a, a+l]. For a fixed value of l, one increases the sample size until at least (1-σ) of the posterior area falls into the HPD region. The same authors also present a complementary rule for sample size determination, the average length criterion (ALC), which reverses the roles of σ and l. In this case, one increases the sample size until the HPD region that covers 1-σ of the posterior area becomes smaller than a given value of l. The primary advantages of the ACC and ALC methods are their conceptual simplicity and intuitive nature, which allows the researcher to design an experiment in terms of the maximum acceptable error and not by costs or utility provided by the experiment, as is typical in Bayesian decision theory. Researchers have applied these methods to binary experiments, as well as others that utilize a simple likelihood function and a single parameter. However, for higher-dimensional posterior distributions, these methods require an elucidation of the posterior distribution that might not be analytically available. A fully Bayesian method of sample size determination is outlined by Lindley (1997a), which criticizes the ACC and ALC criteria for their lack of a utility function (Lindley 1997b). His proposed rule, called maximized expected utility (MEU), provides an approach that simultaneously deals with the problems of sample size determination and experiment design. The criterion is defined as () ( ) {} () max max , | , | Nx z ux p z N d p z N cN αα α ⎡⎤ − ⎢⎥ ⎣⎦ ∑ ∫ , (3.10) 42 where u(x,α) represents the utility function of the experimental design x. One assumes that the utility is independent in z and that each individual observation is incurs a constant utility cost c (Raiffa and Schlaifer, 1961), which is measured in “utiles”. This method is cohesive, since it carries out design and sample size determination in a single step and expresses both the experimental information and costs in terms of utility, making them comparable quantities. However, defining utility is not a trivial task; the researcher must be able to assign value to both the ethical and financial costs of the experiment and know the value of a given degree of accuracy. Consequently, this method has drawn its own criticism for its difficulty to implement and computational complexity (Pham-Gia, 1997; Adcock, 1997b). While one could hypothetically apply many of these methods to experiments that use a continuous observation space, sample size determination presently continues to focus on inference experiments with binary and discrete outcomes and on hypothesis testing; the applications for these methods have grown considerably. For example, Normand and Zou (2002) design experiments using observation clusters to make inferences about the quality of health care across various institutions, while Wang and Gelfand (2002) determine optimal sample size using a simulation based approach that alternates stages of model fitting and posterior approximation for linear and random effects models. Sample size determination methods have also been applied to the testing of multiple hypotheses required for genetic experiments that use DNA micro arrays (Müller et al., 2004a), and in designing experiments to determine whether certain risk factors likely contribute to the occurrence of a given disease (DeSantis et al., 2004). Due to various computational concerns and considerations about implementation, most current procedures for sample size determination are very specialized: tailored very specifically to the experiment to take advantage of analytical shortcuts. To date, a general and robust method of sample size determination that works in both continuous parameter and observation spaces does not exist. 43 Chapter 4 Posterior Sampling and Markov Chain Simulation 4.1 Introduction to Markov Processes Posterior distributions can be an elusive lot. Analytical expressions exist for only the simplest experiments; in general, explicitly determining the posterior distribution requires not only a series of multiplications between the prior and the likelihood, but also the expectation of the likelihood over the prior, which corresponds to the denominator of Bayes’ Theorem. This poses a computationally crushing problem, yet the ability to perform mathematical operations, particularly multidimensional integration, on the posterior distribution is fundamental to Bayesian analysis. In these circumstances, one can use posterior simulation, which employs a random sequence of samples from the posterior distribution in place of an analytical or explicit description to analyze a Bayesian system. As long as the size of the random sample is sufficiently large, the distribution represented by the sequence will accurately describe the posterior. However, when one cannot draw independent samples directly from the posterior distribution, it can often be just as useful to simulate dependent samples using the posterior distribution as its target (Geyer, 1992). As long as the dependent sequence is both stationary (the ensemble statistics do not change with additional samples) and ergodic (independent of the starting value), the sequence of dependent samples performs just as well in analysis as an independent sequence. Perhaps the most familiar and widely explored dependent random sequence is the Markov chain, in which each element in the random sequence is dependent on the immediately previous value, but not on any others. This concept resulted from the work of Russian mathematician Andrei Markov, who in the early twentieth century devised a probability model that characterized the occurrence of consonants and vowels in the poem Onegin by Alexander Pushkin. Markov demonstrated that the occurrence of a given letter depended only on the immediately previous letter, and no others (Gamerman, 1997). This phenomenon was later used to describe other random processes in both discrete and continuous state spaces. In general, a 44 stochastic process that generates the ensemble, A, is considered a Markov process if it satisfies the Markov property, () ( ) 01 1 1 | , ,..., | nn nn pA s A A A pA s A −− ===, (4.01) for all positive values of n and some state, s (Grimmett and Stirzaker, 1992). In other words, the Markov property requires that every value in the sequence be dependent on the previous one alone, and that each value in the sequence affects only the next. The random sequence, A, that results from the Markov process is called a Markov chain. The probability that the chain makes the transition from state i to state j is defined as ( ) 1 | ij n n p pA j A i + = == (4.02) where all values of p ij are positive and sum up to one over all possible transitions. A given Markov chain, A, is considered to be homogeneous if () ( ) 110 || nn pAjA i pA jA i + = ====. (4.03) In a homogeneous Markov chain, the transition probabilities do not depend on the specific position of the sample in the chain; each value of p ij remains constant as the chain evolves. Because sampling from a stationary distribution requires a fixed set of transition probabilities, this work only considers homogeneous Markov chains. To ensure that the Markov chain converges to the posterior distribution, its distribution must be stationary. For this to occur, the chain must exhibit three key properties (Roberts, 1996). It must be positive recurrent, meaning that the chain can return to any state infinitely often as its length increases, and irreducible, meaning that there is a positive transition probability from state i to state j, and vice-versa for all possible states of the chain. Finally, the Markov chain must be aperiodic so that it does not oscillate between different states and can freely walk the entire parameter range. One considers the sampling distribution, π, to be stationary if it satisfies the condition iij j i p π π = ∑ (4.04) 45 for states i and j, where Σπ j = 1. This condition is also commonly expressed in vector-matrix notation as πP = π. When this expression holds for the transition set in P, the corresponding Markov chain is guaranteed to converge to a stationary distribution. One ensures ergodicity in the Markov chain by discarding the first several samples from the beginning of the sequence, called the “burn-in” period. Since the researcher arbitrarily determines the starting value in any posterior Markov chain, letting the chain run for a short period and eliminating these samples allows the Markov chain to evolve independently of its initial condition. This becomes important when one begins to perform operations on the Markov chain in order to determine the nature of the posterior distribution. 4.2 Posterior Simulation Using Markov Chains To carry out posterior simulation, one can construct a Markov chain whose elements simulate random draws from the posterior distribution. The method to generate dependent realizations of the posterior distribution was originally developed by Metropolis et al. (1953) and later refined by Hastings (1970) into what is known as the Metropolis-Hastings algorithm. This algorithm generates samples from a target distribution, π, through a process that accepts or rejects samples drawn from a proposal distribution, q, which encompasses the range of the target distribution. The method draws samples from the proposal distribution one at a time and evaluates each based on the transition probability that the chain will move from the current state, i, to the proposed state, j, p ij , defined as jij ij iji q p q π π =. (4.05) The rule accepts the proposed value with probability p ij ; when this value exceeds one, the rule automatically accepts the sample. If the sample is rejected, then the previous value is repeated in the chain. When the samples from the proposal distribution are drawn independently of one another, then q ij = q j and q ji = q i , and the transition probability simplifies to 46 jj ij ii q p q π π = (4.06) (Tierney, 1994). The algorithm that uses this decision rule to determine the transition between states is called Independence Metropolis-Hastings (IMH). One can prove the stationarity of the target distribution of an IMH Markov chain by looking at the derivation of the algorithm. First, the transition probability is designed so that it satisfies the reversibility condition iij j ji p p π π =. (4.07) This ensures that the target distribution satisfies the condition in (4.04): iij j ji j ji j ii i pp p π ππ π = == ∑∑ ∑ . (4.08) Hastings defined a transition probability for all 1 ij ij ij ii ii ij pqij pp α ≠ = ≠ =− ∑ , (4.09) where 1 ij ij iji jij s q q α π π = + . (4.10) The values of q ij and q ji are from the transition matrix of some arbitrary Markov chain corresponding to the proposal distribution, and s ij is a symmetric function of i and j where 0 ≤ α ij ≤ 1, for all i and j. One can show, with a little algebraic manipulation, that (4.08) and (4.09) satisfy the reversibility condition. From the work of Metropolis et al. (1953), Hastings proposes 1, 1 1, 1 iij j ji jji i ij ij jji j ji iij i ij qq qq s qq qq ππ ππ ππ ππ ⎧ + ≥ ⎪ ⎪ = ⎨ ⎪ + < ⎪ ⎩ (4.11) 47 as a possible choice for s ij . Combining this expression with (4.08) and (4.09) gives the acceptance probabilities used in (4.05) and (4.06). Therefore, the distributions generated by the Metropolis-Hastings and IMH are stationary when provided a sufficient sample length. BEGIN A 0 = mean( p( ) ) L OLD = ℓ(A 0 | z) i = 0 END L NEW /L OLD ≥ U(0,1) i ≥ Length Discard burn-in values A i = i L OLD = L NEW A i = A i-1 i ~ p( ) L NEW = ℓ( i |z) True True False False ++i Figure 4.01: Flowchart of the Independence Metropolis Hastings Algorithm. One can simulate a sequence of dependent random samples from the posterior distribution by using the prior distribution as the proposal distribution and sampling repeatedly, accepting it according to the likelihood ratio between the current and previous sample according to (4.12). Discarding a set number of “burn-in” samples ensures the ergodicity of the Markov sequence. The “target” distribution generated from this process corresponds to the posterior distribution of the experiment. The Independence Metropolis-Hastings algorithm easily simulates a posterior distribution by using the prior as the proposal distribution and the posterior as the target density. Recalling Bayes’ Theorem in (2.03), one can see that the ratio of the target to proposal distributions is the ratio of posterior to prior, which is proportional to the likelihood function. Therefore, the transition probability in (4.06) simplifies to ( ) () | | j ij i z p z α α = . (4.12) 48 By evaluating the likelihood of a sample drawn from the prior distribution, one can generate a population of samples that represent simulated draws from the posterior distribution with relative ease. To simulate draws from the posterior distribution, one draws a random value from the prior distribution, evaluates the likelihood ratio, and accepts or rejects the value based on (4.12). The Markov chain approximates the posterior distribution after it has become stationary upon reaching an adequate length. Since the likelihood function is well defined in Bayesian experimentation and the generation of independent samples from the prior distribution is a trivial task, this work uses the IMH algorithm detailed in Figure 4.01 exclusively to generate all of the Markov chains used for posterior simulation. 4.3 Preposterior Distributions and Posterior Predictive Simulation Kang (2000) demonstrates how one can use posterior Markov chain simulation to estimate the model parameters of an experiment. The researcher simulates a posterior Markov chain from the IMH algorithm, evaluating the likelihood ratio in (4.12) for each proposed α j using the data in z. The best parameter estimate under maximum a posteriori (MAP) estimation criterion is located at the mode of the distribution, which can be determined from the posterior Markov chain. The use of Markov chain simulation to design an experiment requires a different approach. Since the researcher does not have the benefit of a data set, one cannot directly apply the IMH algorithm to generate a posterior Markov chain. Therefore, rather than considering the observation in z as a fixed quantity, it must be treated as an unknown random variable described by the probability density p(z), called the preposterior distribution, which considers the entire collection of potential data sets that might be observed from the experiment. The preposterior distribution employs all a priori knowledge about the experiment to describe the probability that it will yield a given observation, and is defined by the denominator in Bayes’ Theorem, which describes the expectation of the likelihood function over the prior distribution: ( ) ( ) ( ) () () {} | | p pzz p d z α α α αα α = = ∫ E . (4.13) 49 While Bayesians usually employ this expression as a simple normalization constant for a fixed z in (2.03), this distribution exhibits a number of properties that prove quite useful in the design of experiments, when the precise value of z is unknown. First, the preposterior distribution represents the entire observation space, and represents within it every experimental outcome that the researcher could ever possibly observe. This makes the preposterior distribution a complete predictor of experimental behavior, and given an adequate number of samples from p(z), a researcher can design an experiment that can account for every possible contingency in the system under study. Next, because it integrates the likelihood over the entire prior distribution through the expectation operator, it is independent of any specific value of α; this allows the researcher to simulate a data set without any knowledge of the parameter set beyond the prior distribution. Finally, when a researcher makes the observations in an experiment independently of one another, the N- dimensional preposterior distribution is reducible to a series of parallel univariate distributions. One can recognize this paradigm from elementary statistics; any multidimensional sample, X, taken from a multivariate density with a purely diagonal covariance matrix can be reduced to a number of samples from a series of similar univariate distributions: () () () 1,1 11 2,2 22 , 111,1 222,2 , 00 00 ~, 00 00 ~, ~, ~, N NN NN NNNN x x Xp x xp xp xp μ μ μ μ μ μ ⎛⎞ Σ ⎡ ⎤ ⎡⎤ ⎡ ⎤ ⎜⎟ ⎢ ⎥ ⎢⎥ ⎢ ⎥ Σ ⎜⎟ ⎢ ⎥ ⎢⎥ ⎢ ⎥ = ⎜⎟ ⎢ ⎥ ⎢⎥ ⎢ ⎥ ⎜⎟ ⎢ ⎥ ⎢⎥ ⎢ ⎥ ⎜⎟ Σ ⎣⎦ ⎣ ⎦ ⎣ ⎦ ⎝⎠ ⎡⎤ Σ ⎢⎥ Σ ⎢⎥ = ⎢⎥ ⎢⎥ ⎢⎥ Σ ⎣⎦ , (4.14) where p(μ,Σ) is the univariate version of the multivariate density p N (μ,Σ). Therefore, one can instead sample from the univariate preposterior distribution N times rather than sampling from the N-dimensional preposterior, analogous to the equivalence of rolling N dice simultaneously to rolling one die N times. The ability to express the preposterior distribution of an experiment with a single univariate distribution relieves 50 a great deal of computational complexity and provides a great deal of robustness when comparing similar experiments with different sample sizes. To account for all of the possible outcomes that an experiment might exhibit, one must generate random samples from the preposterior distribution. Recalling from (2.02) and (2.04) that each observation is an independent random variable from a normal distribution with mean y and variance g, and that the likelihood function derives from this characterization, one can simulate independent observations from the preposterior distribution in (4.13). To obtain a single observation, Z, randomly drawn from the preposterior distribution, ( ) () () {} ~ ~| p Zpz z α α E , (4.15) one must sample from the expectation of the likelihood function over the prior. Fortunately, it is much easier to sample from this expression than to determine its density function analytically. Since random samples drawn from a probability density function expressed as the average of a number of independent probability distributions have equal probability of belonging to any of the sub-distributions, a large sample drawn from the composite distribution is composed of the union (combination) of samples of equal size drawn from the smaller distributions. To sample from (4.15), the contributing densities reflect the likelihood function at an arbitrary α, randomly drawn from the prior. As the number of different likelihood densities increases without bound, the average converges to the expectation over the prior distribution. Therefore, one can express (4.15) as the combination of samples of equal size from each of the individual likelihood densities at various values of α: {} 1 K i i Z Z = = ∪ . (4.16) Z i represents the i th of K random subsamples drawn from the likelihood at the parameter value α i ( ) ~ ii Zz α, (4.17) where 51 ( ) ~ i p α α. (4.18) There is no fixed minimum size for the set Z i ; if each set is limited to a single value, the set of Z simplifies to a random sequence simulated from the (4.16) through (4.18) in reverse. This consists of simulating a prior vector and using the likelihood to randomly generate N observations for the design matrix, x. Repeating this process for a sufficiently large value of K generates a large sample, Z, which provides a precise description of the preposterior distribution. A simple demonstration (Figure 4.02) validates this algorithm, showing the tight association between the probability density of a large random sample and the analytically derived preposterior distribution for a sample experiment. Figure 4.02: Preposterior Sampling by Transformation and Resampling. A histogram of the random sample produced by the resampling algorithm in (4.15) (red dots) compares favorably with the preposterior distribution that one can compute analytically (green surface). The close agreement between the analytical and numerical confirms that sampling from the prior, predicting a response, and resampling using the experimental error distribution of N(y,g) is a good method to sample from the preposterior distribution. 52 A researcher can simulate a Markov process for an experiment design from the preposterior distribution instead of a fixed data set using posterior predictive simulation (Müller et al., 2004b; Quintana and Müller, 2004). This method takes the entire range of possible observations into account by running many posterior Markov chains in parallel, where each is generated using a different set of observations drawn from the preposterior distribution. Using the aforementioned method for simulating observations from the preposterior distribution, one first computes a series of K simulated experiments. Then, K Markov chains are generated – one for each of the simulated data sets; each of these subchains corresponds to a possible experimental outcome and reflects the individual posterior distributions of its own respective data set. The value of K and the length of each subchain are determined by the investigator, but should be large enough to represent the respective expectations of z and α adequately. One determines the marginal posterior distribution for the parameters from this sample by combining the subchains into the super-chain, A, according to the expression () () {} {} 1 ~| K k pz k Ap z A α = = ∪ E , (4.19) where A k represents the Markov chain generated from the k th posterior distribution using the simulated data set, z k ( ) ~ kk Ap z α, (4.20) and ( ) ~ k zpz. (4.21) The rationale for combining the random sets is the same as used in (4.15). The marginal posterior distribution describes the minimal experimental result or worst-case scenario before making any observations; the most ill behaved and noisy set of observation data will provide a result that is at least as good as this distribution. This provides useful insight when designing the experiment, since it allows a researcher to gauge how a given experiment is likely to affect one’s opinions about the informative parameters. 53 4.4 Diagnostics to Monitor the Convergence of a Markov Process As previously mentioned in Section 4.1, a Markov chain must be both ergodic and stationary to simulate from the posterior distribution accurately. The effective implementation of posterior simulation requires an understanding of the evolution of the Markov chain as it converges to its target. The IMH algorithm uses a random walk to accumulate samples that reflect the posterior distribution, which evolves through three phases corresponding to the maturity of the random sequence. At first, the random walk exhibits sensitivity to the starting point of the search. The initial phase of the Markov chain consists of the portion of the sequence dependent upon the starting value of the chain. The extent of this dependency can be determined by computing the autocorrelation of the Markov chain and determining the chain length at which the initial dependence has been eliminated (Geyer, 1992; Kang, 2000). Samples from this phase of the Markov chain are neither ergodic nor stationary, and are discarded by the researcher; the “true” portion of the posterior Markov chain that one finds useful begins with its second phase. The transient phase of the Markov chain consists of those samples that have overcome their dependence on the starting value of the chain, but whose statistics fluctuate as the length of the chain increases. At this stage, the random walk explores the range of the target distribution in a process called “mixing”. As the random sample mixes and expands over the target distribution, its statistics gradually converge until it enters its stationary phase. At this point, the Markov chain has fully converged to its target distribution, and the addition of new samples has no effect on the distribution of the chain. One cannot use a Markov chain for posterior simulation until it has completed its transient phase and entered into the stationary phase; only at this time is it both ergodic and stationary. Statisticians have developed a number of diagnostic methods to assess the convergence of a Markov chain and determine the end of the transient phase. Brooks and Roberts (1998) provides a review of some of these methods. The potential scale reduction factor (PSRF) diagnostic is a classic example of a variance-ratio based convergence method for a univariate sequence of random values (Gelman and Rubin, 1992). The algorithm requires m parallel random sequences of length n, started from different initial values taken from an “overdispersed” version of the source distribution. This measure estimates the fraction of the total 54 information about the target distribution provided by the random sequence, A, and indicates how this information is likely to improve with added random samples. The reduction factor, ˆ R , defines the convergence of the Markov process as the ratio of the pooled posterior variance estimate between the parallel chains, ˆ V , to the mean of the within-sequence variance across the parallel chains, W. A researcher can easily compute both of these quantities from the parallel Markov chains. As the random sample converges to the target distribution, these variances approach equality and the reduction factor converges to one. A researcher can track the convergence of the Markov chain by computing the reduction factor for sequences of increasing length and monitoring its convergence to one. Since this method provides a univariate diagnostic of convergence, multivariate random sequences require multiple PSRF runs, one for each parameter. This increases the computational time required to determine the convergence of the Markov process. A limitation of the PSRF diagnostic is its inability to monitor convergence of the Markov chain for more than one parameter at a time. To deal with multivariate Markov chains, Brooks and Gelman (1998) pioneered the multivariate potential scale reduction factor (MPSRF), which expands the PSRF concept to diagnose the convergence of all parameters at once in a single measurement, while adding only a marginal increase in computational complexity. The preliminary steps of MPSRF involve the simulation of multiple Markov chains, the segmentation of each chain into batches, and the construction of sequences of increasing length; these steps are identical to those of the PSRF method. By applying the reduction factor to the various sequences, one can graphically monitor the convergence of the sequence toward a stationary distribution as a function of the sequence length. Rather than use the ratio of ˆ V to W, the MPSRF defines the reduction factor as a scalar quantity that describes the distance between two matrices based on the maximum root statistic: 11 ˆ nm R nm λ − + ⎛⎞ =+ ⎜⎟ ⎝⎠ , (4.22) 55 where λ is the largest eigenvalue of the symmetric and positive-definite matrix W -1 B/n. The matrix B/n denotes the between-sequence covariance ()() 1 1 1 m T ii i BnAAAA m ⋅⋅⋅ ⋅ ⋅⋅ = =− − − ∑ , (4.23) while W represents the within-sequence covariance () ()() 11 1 1 mn T ij i ij i ij WAAAA mn ⋅⋅ == =−− − ∑∑ , (4.24) where 1 1 n iij j AA n ⋅ = = ∑ , (4.25) and 1 1 m i i AA m ⋅⋅⋅ = = ∑ . (4.26) As with the PSRF diagnostic, the MPSRF reduction factor converges to one as the random sequence becomes stationary. This diagnostic is much more concise and handy than the PSRF for multivariate Markov processes, since it does not require the researcher to monitor the convergence of each parameter separately. One can monitor the convergence of the Markov process to the stationary target distribution and determine the end of the transient phase by iteratively computing the MPSRF reduction factor of a series of parallel Markov chains with increasing lengths. The researcher carries this out by simulating m parallel chains of a fixed maximum length, subdividing each into a series of equally sized batches of length b, and sequentially combining these batches into a series of subchains of increasing length. These subchains represent the evolution of the random sequence as its length increases; the first subchain contains only the first batch, the second contains only the first two batches, and so on, until the final chain, which combines all of the batches. To account for burn-in, the researcher must discard an appropriate number of samples from the sequence. The algorithm does not consider these samples, and they do not contribute to the value of n. By 56 computing the reduction factor for each parameter over a range of sequence lengths, the researcher can determine how long a Markov chain must be for a given experiment to achieve stationary convergence to the posterior distribution. However, one must take care not to terminate the sequence too soon; often the reduction factor will converge to one only to jump away when the random walk finds a new region of the distribution. The graphical method allows the researcher to verify that the chain has achieved true convergence with the stabilization of the reduction factor over an extended length of the sequence. While this diagnostic does not provide a cure-all for determining the convergence length of a Markov process, and even its authors warn against using it as the exclusive determinant for chain length (Kass et al., 1998), it does provide a concrete starting point determining the minimum required length of a Markov chain. While using posterior predictive simulation, one must modify the diagnostic algorithm to account for the multiple parallel Markov chains employed by the process. When dealing with parallel Markov chains, the objective of the diagnostic is to determine the chain length that achieves convergence for any chain that the Markov process might generate. This is a much more difficult process than with a single Markov chain, since one must take care to represent the entire preposterior distribution, rather than just a single observation vector. The best way to carry this out is to compute the MPSRF convergence diagnostic independently for a series of simulated data vectors from the preposterior distribution, and then combining the reduction factors at each Markov chain length across the parallel chains into a singular quantity that one can use to track the convergence of the random process. This work determines the convergence of the posterior predictive process by computing the reduction factor for each Markov chain and using the root-mean-square to combine the values across the set of parallel chains at each chain length. First, one simulates a set of data from the preposterior distribution and computes the MPSRF for m different chains started from different positions; this process is repeated for K simulated data sets to produce a series of K reduction factors. To eliminate outliers and smooth the response curves, this work discards the largest one percent of reduction factors at each chain length. If one 57 wishes to compute the reduction factor for K posterior predictive Markov chains, the RMS at each chain length is defined as 2 1 ˆ ˆ K i i RMS R R K = = ∑ , (4.27) Where ˆ i R indicates the reduction factor value at a given chain length for the i th posterior predictive chain. The RMS of the diagnostic captures both the mean and variance of the convergence over the set of chains, 22 ˆˆ ˆ RMS RR R μ σ =+, (4.28) which is essential since the reduction factor can be greater or less than one and simply computing the expectation value can indicate convergence where none exists. Including the variance in the calculation ensures that all chains have converged appropriately. When the posterior predictive Markov process has converged, the mean of the reduction factor set equals one and the variance is negligible; the combined MPSRF reduction factor is equal to one. In order to minimize the number of Markov chains required, this work computes the MPSRF reduction factor for the posterior predictive Markov process independently of the actual implementation of the proposed SSD algorithm. 4.5 Precision-based Sample Size Determination 4.5.1 Quality Control and Acceptance Sampling The goal of a manufacturing process is to produce a set of items for a designated purpose. The quality of a given item in the lot refers to some measurable property that determines its “goodness” or suitability for a given purpose. This might be the diameter of a washer, the weight of a ball bearing, the breaking strength of thread, et cetera (Johnson, 1994). Ideally, the goods produced by a manufacturing process should be identical and at an exact quality specification; in reality, the quality of each item produced is unique from all others in the lot. Therefore, one can mathematically describe the items produced by a manufacturing process as independent random events from a continuous probability distribution describing the item quality. This distribution is unknown at the outset of the manufacturing process and can only be determined 58 by analyzing the distribution of the quality of the items that it produces. To ensure the suitability of the product for its designated purpose, quality engineers quantify the quality of a manufacturing process employing two key tools: control charts and acceptance sampling. Figure 4.03: Control Chart for a Random Fabrication Process. The control chart allows the engineer to visually assess the control of a fabrication process from the quality of the items it produces. Each item should fall between the LCL and UCL boundaries, and one describes the control of the fabrication process by the fraction of product that falls between these boundaries. The process is “under control” if only a small fraction fails to fall between the control limits. The primary attribute of a manufacturing process is control, which ensures that the product generated is consistently of reliable quality that does not exceed defined limits. The objective of quality control is the tuning of the manufacturing process to achieve optimal efficiency, which reduces waste caused by the production of defective merchandise. To evaluate the degree of control of a manufacturing process, quality engineers examine the output of the process using control charts, demonstrated in Figure 4.03. The control chart monitors the quality of each manufactured unit individually, plotting this as a series with the item 59 number on the abscissa and the quality of the item on the ordinate of the control chart. The central line indicates the quality specification of the process, while a pair of boundaries designated UCL and LCL designate the upper and lower control limits. Samples that fall between these limits are acceptable, while those that fall outside of these limits are not. The quality requirements of the manufacturer and consumer dictate these boundaries. Ideally, a manufacturing process should produce only acceptable samples. However, demanding absolute control might be impractical depending on the precision of the process and the required level of quality. Therefore, an engineer considers the control exhibited by a process as acceptable if the fraction of samples falling outside of the boundaries is insignificant, falling below some predefined threshold (e.g. 2%). However, just because a manufacturing process is under control does not guarantee that a specific lot of manufactured product will always exceed the quality control specification. This is because each lot represents a relatively small sample that might not accurately reflect the control distribution (Figure 4.04); however, a lot is typically too large to be efficiently examined piece-by-piece to ensure compliance. The most common method to determine the acceptability of a specific lot of manufactured product is by acceptance sampling, in which one examines a subsample of the lot and accepts or rejects the entire lot based on the quality composition of the sample. This requires the development of a decision plan that accepts bad material with a low probability, designated by the lot tolerance fraction defective, p t ’ (Duncan, 1959). The value of p t ’ indicates the division between a “good” and “bad” sample of material and designates the highest admissible fraction of defective merchandise in a lot to be considered acceptable. The probability of accidentally accepting a given lot with a higher fraction of defective items than p t ’ is called the consumer’s risk, β. The risk arises from the inability to measure the quality of each of the items in the lot individually; the decision is a guess based on the sample and has some probability of being incorrect. For example, in a group of one thousand items, the engineer might examine one hundred, find all of them acceptable, and accept the lot assuming that the other nine hundred are represented by the sample. In reality, all of the uninspected items might be defective. The risk indicates that under the current plan, the probability that an “acceptable” lot of size L contains more than Lp t ’ defectives is less than 100β percent. 60 Figure 4.04: Sparse Sampling from a Manufacturing Distribution. Even when a manufacturing process is under control, not every sample of product will adhere to the quality specification. This example shows a controlled process of N(0,1) with 5% LCL and UCL at -2 and 2. However, small manufactured lots may not exhibit this control; in this instance, the control of two of the three lots (designated by C α ) fails to achieve 95% control. The colored markers indicate the normalized histograms of the respective samples. Acceptance sampling accepts or rejects the lot by estimating whether a specific production sample adheres to the quality specification by sampling from the lot (which is in itself a sample from the control distribution) and deciding based on the quality of this sub-sample if the entire lot exhibits adequate quality. This is based on a sample on statistical hypothesis testing (Johnson, 1994), which tests the one-sided hypothesis that the actual fraction defective in the lot is equal to p t ’ against the null hypothesis that it is not. The engineer determines the rejection number in each lot for a given sample size based on the acceptable consumer’s risk, β, and given quality threshold, p t ’. Then, one examines a small subsample from the lot and counts the number of defective items. If this number meets or exceeds the rejection number, then the lot is rejected. Otherwise, the inspector assumes that the lot meets the quality requirement and accepts the lot. If 61 one properly selects the rejection number for the sample size, this decision rule assures that the risk of accepting a lot containing more than p t ’ defectives is reduced to β. 4.5.2 Application of Quality Control to Experiment Design Key parallels exist between manufacturing and Bayesian analysis using Markov chain simulation that one can exploit to gain new insight into the evaluation of experiment designs. The processes are remarkably similar; instead of a random sequence of items produced by a manufacturing process that adhere to a quality distribution, the Markov chain consists of a random sequence of values that correspond to the posterior distribution. In both instances, the analytical expression for the generating distribution is unknown, and the researcher must perform analyses based entirely on samples drawn from it. Manufacturing and analysis also share a common objective: to produce a distribution that is very precise with regard to the target quality value or MAP estimate. Because of these similarities, concepts such as the quality of a manufactured product, control of the production process, and lot acceptability all have posterior Markov chain analogues that provide the basis of the proposed method of sample size reduction. Similar to the quality of items produced by a manufacturing process, Bayesian analysis regards the model parameters as random variables described by the posterior probability density. One can imagine the values of a posterior Markov chain as a group of manufactured product, where each link in the chain represents an item with a given attribute; rather than an item with a given weight, diameter, et cetera, there are posterior samples with associated parameter values. However, unlike the manufacturing scenario where the target specification is determined by the manufacturer, the target specification in the experimental case is determined by the system under study, defined as the best point estimate of the parameters, ˆ α . In this work, this value takes on the location of the highest posterior mode, which corresponds to the MAP estimator. The quality of the i th value in the posterior Markov chain, A i , is determined by how far it deviates from the MAP estimate in terms of its relative error, 62 1 ˆ ˆ i i A q α α − ⎛− ⎞ = ⎜⎟ ⎝⎠ , (4.29) exactly as the quality of a manufactured item is defined as the difference between its attributes and the target specification. The requirements of a given experiment dictate how low the quality of a link in the posterior Markov chain may fall before considering it “defective”. This plays an important role in determining the limits for quality control. Establishing control over a random process is paramount to ensuring consistent quality. In the manufacturing scenario, the engineer brings the process under control by manipulating the production until only a small fraction of items falls outside a pair of control boundaries. In the case of Bayesian experimentation, one can adjust an experimental protocol so that a large fraction of the posterior samples falls within a control boundary defined on either side of the estimate. This is loosely equivalent to the notion of a posterior credibility interval (Section 2.4.2), and ensures that the MAP estimator exhibits a given degree of precision in the parameter estimates. A researcher can establish control over the posterior distribution in four principle ways: 1. Decrease the experimental (measurement) error. 2. Reduce the prior covariance. 3. Improve the design to provide more information. 4. Increase the sample size. This work assumes that the experimenter has already carried out the first three steps; one will always reduce the experimental noise as much as possible, use all of the available prior information, and use an appropriately optimal design to carry out the experiment. By adhering to the first three points and systematically increasing the sample size of the experiment, the investigator can increase the precision of the posterior Markov chain (and thus decrease the width of the posterior distribution), and bring the experiment “under control”. Figure 4.05 illustrates how the estimator precision increases with the sample size. 63 Figure 4.05: Posterior Precision and the Control Requirement. The same experiment can yield vastly different posterior distributions for different sample sizes. A given posterior distribution satisfies the quality requirement when some fraction of its probability falls between two boundaries (vertical dotted lines) at a relative distance from the mode (in this case, 10% of its magnitude). Low sample sizes yield extremely imprecise posterior distributions (green plot). As the sample size increases, the posterior becomes more precise, eventually satisfying the control requirement (red plot). Quality engineers typically assume that a manufacturing process can be refined to reduce the number of defective items to negligible levels, and therefore do not often explicitly measure the degree of control of the process. This marks a point of divergence between manufacturing and Bayesian analysis, since one cannot always achieve this high degree of precision in an experimental scenario. Therefore, the ability to quantify the precise degree of control exhibited by the posterior distribution can be extremely useful. The control exhibited by a posterior density, C α (z), is defined as the probability that a given point estimate from the posterior distribution will fall within a control region, R. Using posterior sampling, this expression reduces to the posterior area or hypervolume that falls within the control region: 64 ( ) ( ) () 1 | 1 R L i i Cz p z d AR L α α α = = =∈ ∫ ∑ I . (4.30) The symbol I denotes the indicator function, which is equal to one when a given random sample, A i , falls within R and zero otherwise. The equivalence between the expressions in (4.30) arises from the relationship between a probability density function and a large sample drawn from it; by definition, the integral of the probability density within R indicates the probability that some event A i will occur in that region. Therefore, if L independent random samples are drawn from p(α|z), each will fall within the boundary with probability C α (z). If L is sufficiently large, approximately LC α (z) of them will fall within R. This method of pdf integration is especially useful in posterior simulation; although the exact analytical form of the posterior density is unknown, it is still possible measure its control using samples from a posterior Markov chain. Proper designation of the control limits is critical in order to provide adequate control over the random process without imposing an unnecessarily strict regulation. In the manufacturing scenario, the engineer defines the control limits based on practical considerations imposed by the intended use of the product; an item whose quality falls outside a given region cannot be used for its intended purpose. A similar rationale follows for the experimental scenario, in which the researcher defines the control region, R, based on the required or desired estimator precision. For a single-parameter problem, one can explicitly define the control region as the range of posterior values that fall within a fixed interval described by a tolerance, ρ: ˆ if ˆ i i A AR α ρ α − ∈≤, (4.31) where A i represents the i th element in the posterior Markov chain. The value of ρ falls between zero and one, and represents the minimum allowable relative precision for the parameter estimates; the value typically corresponds to a low percentage of the magnitude of the best point estimate. In practice, one evaluates the control of each posterior Markov chain for multiple values of ρ, which provides a range of control information without increasing the computational complexity. 65 Figure 4.06: Elliptical Control Regions with Maximum Boundary Points. This figure illustrates the two- and three- dimensional elliptical control regions used in this work. The center of each region (red circle) is placed at the mode of the posterior distribution. The radii of the region (yellow circles) establish the lower and upper control boundaries of the region and correspond to a fraction of the magnitude of the posterior mode. The region can be expanded into more dimensions using the generalized ellipse equation (4.32). When the experiment employs multiple parameters, the posterior distribution becomes a multidimensional density and requires a more complicated control boundary defined in P-dimensional space. While the simplest solution is a rectangular prism, which integrates the chain between sets of linear boundaries similar to (4.31) at each parameter, a more compelling and useful option for a region of interest is the ellipsoid, which uses a rounded region that supports a more evenly defined tolerance region. Figure 4.06 illustrates an example of two- and three-dimensional elliptical regions and their respective equations. In each case, the center of the elliptical region rests at the MAP estimate and extends in each cardinal 66 direction by the values defined in (4.31), while the off-axis region exhibits curvature that rejects samples that stray far from the mode in more than one dimension. The multivariate elliptical control region for an experiment is defined as 2 , 1 ˆ if 1 ˆ P ij j i j j A AR α αρ = ⎛⎞ − ∈ ≤ ⎜⎟ ⎜⎟ ⎝⎠ ∑ , (4.32) where A i,j represents the i th link in the Markov chain for the j th parameter, and the value of ˆ α j is the mode of the posterior distribution at the j th parameter. In the special instance that the boundary values are equal in every dimension, the ellipsoid equation reduces to that of a circle or spheroid. In addition, this expression reduces to (4.31) for single-parameter experimental models, which eliminates the need for multiple boundary formulas based on the number of parameters in the experimental models. As mentioned in Section 4.3, one can use Markov chain methods and posterior predictive simulation to design experiments. Since the data vector, z, has not been collected prior to experimentation, the design challenge requires that the researcher simulate a series of potential observations from the preposterior distribution, p(z), and make some sort of decision about the experiment design based on the whole of possible experimental outcomes. However, the prior art that pools the posterior predictive Markov chains together presents some insurmountable limitations. First, the marginal posterior distribution is very broad and is impossible to reduce to a tolerable degree of precision. This distribution also does not reflect the relationship between specific data sets and the parameter estimates that they produce; to assess the control exhibited by a given posterior distribution, the Markov chains must be kept separate and not combined. In addition, while one must simulate many different possible data sets from the preposterior distribution to account for the experimental uncertainty, each stands on its own, as an individual and independently conducted experiment, and only one of these data sets can be the “real” one that corresponds to the system under study. Therefore, it is reasonable to look the posterior Markov chain generated from each set of data on its own and consider each as a potential experimental result, rather than examining the population of posterior Markov chains as a cohesive whole. 67 Once a series of K independent posterior Markov chains is generated from independent samples from the preposterior distribution, one can determine the mode and compute the boundary region for each chain individually. For each chain, the researcher computes the control according to (4.30) as the fraction of samples falling within the elliptical boundary region defined by (4.32). At this point, the results of the individual chains must be combined into a score that reflects the whole of experimental outcomes. The first proposed decision criterion bases itself on the expected control (precision) of the MAP estimate over all of the posterior distributions from the K experimental outcomes, and is defined as 10 : dQ Q ≥, (4.33) where ( ) { } () | z z R QCz p zd α α α = ⎧ ⎫ = ⎨ ⎬ ⎩⎭ ∫ E E . (4.34) The optimal sample size according to this criterion is the smallest for which (4.33) is true. In this case, the researcher chooses the value of Q 0 as a sufficiently high value between zero and one that corresponds to the credibility interval that, on average, should fall within the region R. This work uses a fixed value of 0.90 for Q 0 , corresponding to the ninety percent credibility interval. According to this criterion, the researcher increases the sample size until the average control over all possible experimental outcomes is considered acceptable. While the criterion in (4.33) provides a concise and straightforward decision rule for sample size determination, it is incomplete. Since only one of the posterior Markov chains used in (4.33) can represent the actual system, the expectation might not adequately predict the outcome of the experiment. For example, if 40% of the K posterior predictive Markov chains are controlled (at C α = 0.99) and 60% are uncontrolled (at C α = 0.85), the average precision is an acceptable 0.91 in spite of the fact that the posterior for any given data set has a higher likelihood of being uncontrolled than controlled. However, the expected control in (4.33) provides general guideline for SSD and a solid lower limit to the sample size for the experiment. 68 4.5.3 Posterior Predictive Simulation and Virtual Acceptance Sampling The ultimate objective of the proposed technique of sample size determination is to ensure that the data collected from the experiment yields a posterior distribution that satisfies the precision requirement. Therefore, what truly interests the researcher is not the average control over all experimental outcomes, but rather the probability that any given experimental outcome will be under control. Since only one of the K simulated data vectors will correctly characterize the experiment, one should consider the probability that a given posterior predictive chain will be under control, () ( ) () 0 0 | t t R pC z Q p p pzd Q p α αα ′ ≥> ⎛⎞ ′ ≥> ⎜⎟ ⎝⎠ ∫ , (4.35) for any z randomly taken from the preposterior distribution. The lot tolerance fraction defective in p t ’ describes the maximum allowable fraction of uncontrolled posterior predictive Markov chains; an acceptable experiment will exhibit at worst a 100p t ’ percent chance that the posterior distribution will not satisfy the precision requirement. This provides the foundation for an even stricter decision criterion than (4.33), which assures the precision of the parameter estimates for the observed data. However, even the decision rule in (4.35) exhibits some implementation hazards. The preposterior distribution can yield a virtually infinite number of possible observations for a given experiment, and the relatively high dimension of z and the computational complexity of generating posterior predictive Markov chains typically precludes the use of a K value high enough to generate a representation of p(z) adequate for the implementation of (4.35). This problem is similar to that faced by quality engineers who must estimate the acceptability of an entire lot of manufactured product based on the number of defective items in a relatively small sample. The experiment design scenario adjusts this procedure to measure the risk incurred by approving the design, based on the number of “defective” posterior predictive Markov chains in the population. The researcher must determine the fraction of defective Markov chains in the population and compute the risk associated with assuming that (4.35) is true. 69 The second proposed SSD decision criterion mirrors the process of acceptance sampling to ensure that the precision requirement in (4.35) is satisfied with a given confidence. This method also examines a population of K posterior predictive Markov chains using simulated data from the preposterior distribution, p(z). This population of chains is a sample from the nearly infinite set of possible experimental outcomes, and is directly analogous to a small sample extracted from a large lot of manufactured product. As with acceptance sampling, this decision rule employs a hypothesis test to determine the researcher’s confidence that the experiment satisfies the expression in (4.35). The risk, β, corresponds to the decision error and indicates the probability of incorrectly accepting the experiment design when (4.35) is actually false. The test involves counting the number of chains that fail to satisfy the precision requirement and computing the risk as the area in the tail of normal approximation of the binomial distribution (Figure 4.07). Figure 4.07: Consumer’s Risk from Acceptance Sampling. A relatively small sample taken from a population might not adequately reflect the statistics of the entire population. The consumer’s risk describes the probability of mistakenly accepting a population that actually does not satisfy the precision requirement based on a sample that does. One computes the risk by computing the tail probability (red area) of the standard normal distribution to some value Z, as defined in (4.38). 70 One accepts the experiment design that marginalizes the risk of accepting (4.33) to below a threshold value β 0 according to 20 : d β β ≤, (4.36) where () 2 1 exp 0.5 2 Z x dx β π −∞ =− ∫ (4.37) and () () ( ) 0 1 1 K it i tt Cz Q Kp Z Kp p α = ′ <− = ′′ − ∑ I . (4.38) The risk begins at 100%, indicating the absolute falsehood of (4.35), and decreases to zero. The speed at which β tends to zero is affected by the size of K; larger values of K require less guesswork and produce a very certain, step-like risk function, while smaller K cause a more gradual change in β. The optimal sample size according to this criterion is the smallest N for which (4.36) is true. At this point, any posterior Markov chain produced from the experiment has greater than a 1-p t ’ probability of satisfying the precision requirement, with only a 100β 0 percent chance of error. This represents a stricter criterion for sample size determination that provides a strong assurance that the credibility intervals of the parameter estimates will satisfy the precision requirement. 4.6 Implementing the Decision Criterion Figure 4.08 illustrates the implementation of the proposed criterion for sample size determination. The purpose of this method is to determine the optimal sample size, N*, defined as the smallest possible sample size for an optimal design, x*, that satisfies one or both of the decision criteria in (4.33) and (4.36), corresponding to expected control and marginalized risk, respectively. The “control” of a given posterior predictive Markov chain corresponds to the relative precision of its parameter estimate; the chain is “under 71 control” when Q 0 of the posterior probability falls within an elliptical region whose radii are 100ρ percent of the absolute value of the MAP estimate. At the optimal sample size, the actual data from the experiment should yield a posterior Markov chain that satisfies the precision requirement for the parameter estimates. This work defines the decision criteria for sample size determination such that a researcher can evaluate both simultaneously using the same population of posterior predictive Markov chains. This allows one to decide on a sample size using both criteria without increasing the computational complexity. The process begins by computing a design with P observations using any of the available design criteria for the given prior distribution and experimental variance. This corresponds to the minimum sample size for the model. Next, one must generate K random samples from the preposterior using the method outlined in Section 4.3: 1. Draw a random sample α from the prior distribution, p(α). 2. Compute the model prediction, y, at the design value for the parameter, α. 3. Randomly generate an error term, v, from N(0,g), where g is defined by the variance model. 4. Simulate an observation vector, z, by adding y and v according to (2.01). At this point, a posterior predictive Markov chain is generated for each of the K simulated data sets using the Independence Metropolis Hastings algorithm. The distribution reflected in each of these chains represents a possible experimental result at that sample size. The researcher can determine the control for each posterior chain by determining the fraction of posterior samples that fall within the boundary region according to (4.32). The data at this stage consists of a series of K scalars corresponding to the control exhibited by each chain. The next step involves analyzing the control values for each of the SSD decision criteria. The expected control in (4.31) is computed as the average of the K control values, and the first decision criterion is satisfied when this value meets or exceeds the control threshold, Q 0 . Next, the researcher must count the number of chains whose control value falls below Q 0. One computes the risk for the sample size using (4.37) and (4.38) for a given value of p t ’. The second decision criterion in (4.36) is satisfied when the risk drops 72 below the threshold β 0 . When the experimental predictions for the current sample size meet both of these decision criteria, the researcher can terminate the procedure and declare that sample size as the optimum. Otherwise, one must compute a new optimal design for the next largest sample size and repeat the procedure, continuing until the sample size reaches a predetermined maximum sample size, N MAX . If at some point the proposed sample size exceeds a certain value, N MAX , without satisfying both decision criteria, then one of a number of problems could exist with the framework of the experiment design. For example, the prior or experimental error might be too large, or the design method might be insufficient. If this occurs, the researcher must terminate the search and consider one of the following alternatives: 1. Determine N* using the expected control instead of the risk. 2. Use the value of N at the point at which the risk or quality curve begins to converge. 3. Reject the experiment design methodology and retry using differently chosen designs or a smaller prior distribution and experimental error variance. The first alternative is less preferable to using a sample size that marginalizes the risk, but still may produce a credibility interval of the parameter estimates within the precision requirement, depending on the “true” parameter value and the experimental error. Otherwise, an experiment using this sample size might reduce the parameter uncertainty enough to propose a future experiment that can satisfy the risk criterion. The second option also does not provide a substitute for finding a true optimal design, but the convergence of both the expected control and risk indicates that further increasing the sample size does not improve the experimental precision beyond this point. If neither criterion has converged or has converged to a value very far from the threshold value, some aspect of the experimental framework is flawed. At this point, the researcher should consider redesigning the experiment with a different prior distribution, variance parameters, or methodology for determining optimal designs. 73 BEGIN Compute X * for N Simulate z ~ p(z) Generate chain A k from IMH k < K A k,i R i < L Compute mode of A k by KDE Produce R from and mode END 1 Integrate A k over the region R Eqs (4.31) and (4.32) ++b True False i=0, b=0 False True k=0, n=0 C = b/L Preposterior Sim: Eqs (4.15) to (4.18) IMH: Figure 4.01 KDE: Section 5.3.4 ++i ++k True False Decision Rules: d 1 : Q = mean(C ) d 2 : = risk(C ) d 1 : Eq (4.33) d 2 : Eq (4.36) Increase N N = P d 1 AND d 2 False True N > N MAX END 2 False True N * = N N * = ??? FAILURE Modify experiment according to p 73 Figure 4.08: Flowchart of the Proposed Method for Sample Size Determination. Candidate designs are computed for increasing sample sizes. For each candidate design, the algorithm generates K posterior predictive Markov chains, computes the precision of each using an elliptical control region, and combines these values across the chains using the decision rules in d 1 and d 2 . The optimal sample size is the smallest value of N that satisfies one or both of the SSD decision rules, depending on the experiment and the required estimator precision. 74 Chapter 5 Methods of Implementation 5.1 Programming and Naming Conventions Due to the high computational demands of experiment design and Markov chain simulation, this work developed a proprietary software package to facilitate the quick and effective design and SSD of a wide variety of experiments. This software, called XRedDesign (short for Experimental Reduced Design), adheres to the object-oriented C++ coding standard and consists of a modular hierarchical application- programming interface (API) with which a researcher can design, simulate, and evaluate a wide variety of experiments. The program code contains a set of core classes to correspond to an experiment, prior distribution, models, and design algorithms, as well as data storage classes and functions that carry out many conventional mathematical operations. This takes advantage of the object-oriented nature of the language to allow the implementation of additional user-defined models, functions, and design and decision criteria to provide virtually limitless flexibility using a simple to use and understand interface (Figure 5.01). All of the results described in this work were obtained using a program that uses the XRedDesign API, which is available to the research community under an open-source license agreement. The explanations and descriptions provided in this chapter do not delve into the low-level details of the software architecture, but still assume that the reader has a working understanding of the C++ language. In particular, knowledge of object-oriented programming and class inheritance is critical to understanding the XRedDesign API and its application to various experiments. Because the result of an operation between to variables depends on their individual data types, the specific data class of each variable plays an important role in algorithm development. For example, division between integer variables will always discard any fractional result and return only the integer part (e.g., 3/2 = 1). Subtraction between unsigned values poses a similar pitfall, since negative numbers are not represented and “wrap” back to the largest representable value (e.g., 4-5 = 4294967295). The addition of 75 new data storage classes for matrices, vectors, and Markov chains only increases these hazards. To maintain clarity, the XRedDesign source code employs a simplified variant of Hungarian notation (Simonyi, 1999), in which one comments each variable with a lower-case prefix at the beginning of its name that indicates its data type. However, unlike other versions of Hungarian notation that attempt to label every possible data type (Petzold, 1998), the convention used in this work specifically targets type-mixing and allows compatible types to share prefixes - non-data variables do not require a prefix at all. For example, “float” and “double” both use an f prefix because they are both floating-point types. This greatly reduces the number of prefixes while improving the anticipation of potential problems that result from mixing data types. Table 5.01 lists the prefixes and their corresponding data types used in the XRedDesign API. Figure 5.01: XRedDesign API Syntax. This example illustrates the program code required to initialize the experimental framework, then compute an optimal design and sample size. Since the API code performs most of the computations, the main program code that the researcher must supply is minimized. One should refer to Appendix C for the actual XRedDesign-derived code used by this work. 76 Numeric Types Data Types Type Prefix Example Type Prefix Example unsigned int n nIndex char, byte c cOption int, long i iIndex string sz szFileName float, double f fMean bool b bFlag vector v vMean Markov chain mc mcPosterior matrix m mCovariance Table 5.01: Prefix Notation Used for Variable and Class Types in the XRedDesign API. This table demonstrates the highly reduced form of Hungarian notation used in this work, which indicates conflicting data types that might produce unexpected results when combined. For example, division between integers discards the fractional part of the result and returns another integer (e.g., int(3)/int(2) = 1: NOT 1.5). 5.2 Architecture of the Application Programming Interface The XRedDesign API takes specific advantage of the object-oriented nature of the C++ language. Object- oriented programming utilizes the combination and interaction of classes and structures, which contain data about the object and functions and operators to manipulate this data. There are two types of classes: base classes, which provide a general foundation for a series of computations, and derived classes, which expand upon the base class with additional customizations tailored to a specific task. The derived class shares an “is-a” relationship with its base class, and automatically inherits all of the data members and functions of its base class. This hierarchy allows the programmer to write generic application code and include code in the base class that is applicable to all of its derived types, which reduces the amount of redundant code in the program. The software designed for this work employs a series of core base classes that include general design- and framework-based classes for experiments, experimental and variance models, and prior distributions, as well as data- and computation-based classes for structures such as matrices, vectors, and Markov chains. This section provides a brief detail of the major core and data objects in the XRedDesign API (Figure 5.02); Appendix A provides a detailed documentation of the software structure, including its classes, member functions, and attributes. 77 CExperiment CMesh Experimental Models CExpModel CExpDecay CSigmoid CRiseFall User Defined Variance Models CVarModel CConstVAR CQuadrVAR CPowerVAR User Defined Prior Distributions CPrior CDeltaPrior CUniformPrior CLognormalPrior CNormalPrior User Defined CSizeControl CSizeCriterion CSizeRisk CSizeCombined Sample Size User Defined CInfoD CInfoCriterion CInfoEID CInfoED CInfoBD Optimal Design User Defined CControlRegion CROIRectangle CROIEllipse User Defined Data Classes CVector CMatrix CMarkovChain arrayMC CParam Figure 5.02: Mapping of Object Relationships in the XRedDesign API. The API consists of data structures and classes (light blue, Section 5.2.1), model classes (green, Section 5.2.2), and algorithm classes (orange, Section 5.2.3). Black lines indicate inheritance between base and derived classes, red lines indicate that a class object is a member of the other class, and blue lines indicate that an object of the given class is created and employed by a given function. Appendix A contains a more detailed treatment of the XRedDesign API and the interplay of its various object classes. 78 5.2.1 Vector, Matrix, and Markov Chain Operators The API features number of new data-type classes in the API that facilitate various operations not supported by the default data types in C++. In particular, objects corresponding to vectors and matrices are required to execute the proposed decision criteria for sample size determination. The XRedDesign API contains classes CVector and CMatrix to store one- and two-dimensional data structures and handle operations associated with vector and matrix arithmetic. In addition to basic operations such as addition, subtraction, and multiplication, these classes contains a number of linear algebraic operations such as the matrix transpose, determinant, inverse using Gaussian elimination, vector sorting using the QUICKSORT algorithm (Hoare, 1961), and magnitude computation. Additionally, the CMatrix class performs the Cholesky decomposition on a positive definite matrix for use in generation of multivariate normal random deviates, and can compute the eigenvalues and eigenvectors for a given matrix. To facilitate the loading of numerical data into the matrix, each of these classes contains a function called string2data that allows the user to use a string to input matrix and vector data in the form of “1 2 3; 4 5 6”, with semicolons separating the rows of the matrix (similar to the format used by MATLAB). The CVector and CMatrix classes are the most commonly used objects in the XRedDesign API, and provide the foundation for higher-level objects and operations in this software. The CMarkovChain class manages Markov chains in the software, particularly with statistical analyses on their samples. In this work, a Markov chain object consists of a series of CVector objects of length P, each of which corresponds to a random sample from the posterior distribution. The function grow initiates the generation of a Markov chain of a specified length from a given experiment object and observation vector using the Independence Metropolis Hastings algorithm. Once a Markov chain has been grown, the CMarkovChain class contains built-in functions that can compute statistics such as the mean, covariance, percentile, and mode of a Markov chain. An algorithm also exists to integrate the Markov chain over a region of interest in the parameter space, which is critical to computing the control exhibited by a given posterior distribution. Additionally, the XRedDesign API features a structure called arrayMC that can store multiple parallel Markov chains and pass them into functions to carry out subsequent analysis. This is 79 especially important in algorithms that employ multiple parallel Markov chains, such as the MPSRF diagnostic and in posterior predictive simulation. 5.2.2 The Experimental Framework The primary core object in the software API is the CExperiment class, which contains all information about the Bayesian framework of the experiment, including the sample size, input dimension, number of parameters, the experimental and variance models, the prior distribution, and current design matrix. This class acts as the command center for all calculations involving the experiment, which includes determining the optimal design and sample size, performing simulation of experiments, and computing parameter estimates and experimental analysis from an observed or simulated data set. After the researcher has created a CExperiment object in the program, these activities can be performed using the designOpt, designSSD, simulate, and estimator (either estimateMLE or estimateMAP) functions. The researcher can also compute the information of an experiment for a given design and information criterion using the function info. The foundation of a Bayesian experiment is the designation of the experimental and variance models. The XRedDesign API includes the virtual base classes CExpModel and CVarModel, which represent the experimental and variance models, respectively. The main CExperiment class contains a data pointer to a single CExpModel and a single CVarModel object, which can be accessed directly through getExpModel and getVarModel, or indirectly through the simulate member function. The software provides derived classes for the three major variance models discussed in Section 2.2.3 and the experimental models used in the evaluation of this work (Chapter 6); a researcher can also derive additional classes to include new experimental and variance models, allowing the software to accommodate any experimental scenario. The base model object remains very abstract in structure; each class contains data members defined in the function init that denote the number of informative parameters, P, fixed parameters, Q, and the dimension of the model input, M, as well as information about the constraint applied to the input domain. The function eval generates a model prediction for a given stimulus vector and parameter set and returns the system 80 response as a floating-point scalar. The CParam class is a universal parameter structure that combines the α, θ, and σ parameter vectors into a single object that one can easily pass into the model. Although the model must adhere to this framework of inputs and outputs, the eval function can apply any operator to its inputs, which can include solving a system of differential equations, employing numerical methods to solve a specific problem, or performing complex evaluations based on multiple computations. Therefore, the researcher can tailor the experimental model to the specific goal of the experiment, regardless of the complexity of the model. Since the XRedDesign API employs a completely computational architecture without analytical shortcuts, the model function can even represent a nonlinear system. The virtual base class CPrior manages the prior distribution of the experiment. Since most of the design functions within the CExperiment class reference CPrior in their calculations, proper designation of the prior distribution is especially critical. This class contains members for the mean vector and covariance matrix of the distribution and contains prototypes for computing the pdf value at a given value (pdf) and the generation of random vectors from the distribution (rand). The program supports different prior distributions by deriving classes from CPrior; by default, the software includes three derived classes called CUniformPrior, CNormalPrior, and CLognormalPrior that correspond to the multivariate uniform, normal, and lognormal prior distributions. CPrior also contains code to discretize the prior and store it as a CMesh object, which, as discussed in Section 2.3.1, provides a stationary representation of the prior that streamlines the computation of the EID- and ED-optimal design criteria in (2.19) and (2.20). One initializes the CMesh object for a given prior distribution using the initMesh member function of the CPrior class, which computes the nodes and weights of a discrete multivariate distribution. A critical aspect of the Bayesian framework is the generation and management of posterior simulations, which in this work take the form of Markov chains. The CExperiment class also contains wrapper functions for generating posterior Markov chains for the experiment (posterior) and for computing the MPSRF diagnostic (mpsrf), which determines the convergence of a posterior simulation to a stationary distribution for both the posterior and posterior predictive instances. This allows one to determine the stationary length 81 of the experiment’s Markov process before computing the parameter estimate or carrying out the sample size determination. 5.2.3 Optimal Design and Sample Size Criteria Experiment design often involves the determination of different criteria to evaluate the suitability of a given design. For optimal design, an algorithm must maximize the information provided by a design over the input space. For the determination of the optimal sample size, the algorithm must determine the smallest sample size that provides a minimal degree of precision. The virtual base classes CInfoCriterion and CSizeCriterion, which correspond to experimental information and precision, serve the purpose of evaluating a candidate design and providing a decision to accept or reject it based on the given criterion. The researcher can add additional criteria for optimal design and sample size determination by creating derived classes from the appropriate base class. Optimal design involves computing a measure of the information provided by the experiment at a given candidate design, and globally optimizing to determine that which provides the maximum amount of information. The CInfoCriterion base class contains the criterion function that indicates the degree of information provided by a candidate design. This class contains two key functions. The first function, build, runs once at the beginning of a design optimization and performs one-time initialization tasks such as constructing the prior CMesh object. The second function, eval, evaluates the information criterion at a given design and is called iteratively by the global optimization routine. Another function, fisher, is called iteratively by eval and computes the Fisher Information Matrix in (2.16) for a given design and set of parameters. This degree of modularity allows the maximal recycling of code; the global optimization and Fisher Information Matrix code appears once, allowing each of the respective eval functions to remain small, occupying about ten lines of code each. The XRedDesign API contains derived CInfoCriterion classes for D-, ED-, and EID- optimal designs, but one can easily implement other design criteria by adding additional derived classes. 82 The determination of the optimal sample size of an experiment requires a researcher to compute the estimator precision across a population of parallel Markov chains generated from the preposterior distribution. The CSizeCriterion virtual base class generates and analyzes the precision of Markov chains as required by the precision-based SSD criterion. This class contains a function, init, in which the researcher defines the size and shape of the control region for Markov chain integration. The provided classes derived from the CControlRegion base class are CROIEllipse and CROIRectangle, which define ellipsoidal and rectangular prismatic regions in the parameter space. Unlike the optimal design of the experiment, this criterion does not automatically make a sample size decision for the researcher. Rather, it outputs the degree of control of each chain to an output file, and allows the researcher to make the final decision regarding the sample size. This allows more freedom, as some experiments might not provide sufficient precision to achieve the desired consumer’s risk and a researcher might have to make compromises in the sample size used, or redesign the experiment more efficiently to achieve better precision. 5.3 Other Computational Considerations 5.3.1 Domain Constraints An experimental design often consists exclusively of positive inputs; time-dependent experimental models are causal and input stimuli such as concentration, frequency, fluid flow, and displacement often have positive magnitude. In addition, experimental models sometimes exhibit insensitivity to small changes in input stimulus, where one must apply an extremely wide range of inputs to evaluate the entire response surface. In these cases, one can constrain the design space to hasten the convergence of the global optimizer and anchor the inputs to the correct domain, preventing optimization drift to illegal input values. The XRedDesign API includes native support for both positivity and logarithmic constraints, but a researcher can add new constraints with little additional effort by adding new types and definitions to the existing constrain and deconstrain functions in the CExperiment base class. The software implements a given constraint by applying a transformation to the model input values at various times according to the illustration in Figure 5.03. At the beginning of the optimization, this method 83 creates an intermediate design variable, X c , produced by transforming the real input, X, according to the constraint. The domain of X c covers all real numbers, which allows seamless optimization of the objective function. The algorithm optimizes over the value of X c and transforms it back to X upon completion. Each experimental model applies the inverse transformation on the input before processing, so that the predicted model response corresponds to the correct input X, and not the constrained input X c . This way, the program sees the model prediction value from the positive design values, and not from the intermediate variable. One can also use this methodology for parameters during MLE and MAP estimation to speed convergence and ensure that the resulting estimates are reasonable as long as the parameter values are known to be positive. Using a constrained domain, the global optimization algorithm can search the input space 2 N times faster than when using an unconstrained input domain. Figure 5.03: Domain Constraints for Experimental Models. To hasten the convergence of the optimal design, the program restricts the input domain to only its critical values. The method takes the form of either a positivity or logarithmic constraint, which applies a square root or log 10 function at each “constrain” block and its inverse at each “deconstrain” block. This ensures that the optimization algorithm is applied to the constrained space, but the criterion and return value reflect the full domain. 84 5.3.2 Derivatives and Gradients To calculate the information criterion of an experimental design, one must be able to compute the partial derivatives of the model function with respect to the various parameter values. This is because all of the relevant criteria make use of the sensitivity matrix in (2.17). This work computes each partial derivative numerically using the forward-difference approximation, defined as ( ) ( ) ( ) ;; ; fx fx h f x h α αα α ∂+− = ∂ , (5.01) where ( ) max , h EPS EPS α =⋅ . (5.02) The variable EPS represents the machine epsilon: the smallest number the computer can add to the number one in which the sum is recognized as greater than one. This value varies between computers, but is usually about 10 -16 . This ensures that the step size, h, is as small as can be safely represented on the given computer. The researcher can compute the sensitivity matrix with relative ease by computing the derivatives of each parameter for each of the N observations. 5.3.3 Random Number Generation Random number generation plays a critical role in this work. In fact, one can see by simple inspection that optimal design and the proposed decision rule for sample size determination require an exorbitantly large number of unique random deviates. Table 5.02 displays the approximate number of random draws required for each stage of this work; the optimal design requires almost 10 9 random samples, while the posterior predictive simulation used for SSD requires over 10 12 random samples! Since most algorithms that produce random deviates will begin to repeat themselves well before this value, one must take care to ensure that the random number generator has a sufficient period to cover this range. The computation time is a factor in this work as well, since the algorithms are complicated enough that a slow random generator will ensure that the calculations never complete in a reasonable time. 85 Stage of Evaluation Random Samples Total Optimal Design (M*N)(trials)(MaxIt) = (2)(90)(100)(15000) 2.70E+09 IMH (P+1)(L+B) = (4)(55000) 2.20E+07 SIM (P + M*N) = (3 + 2*90) 5.40E+02 MPSRF (K)((2 P )(IMH) + SIM) = 5000(8*2.2E+07 + 540) 8.80E+11 SSD (K)(IMH + SIM) = 50000(2.2E+07+540) 1.10E+12 Evaluation (evals)(IMH + SIM) = (2500)(540 + 2.2E+07) 5.50E+10 Table 5.02: Random Number Requirements for Each Algorithm. To ensure that the computer- generated random sequences used in each algorithm do not repeat, the periodicity of the random number generator must exceed the maximum number of samples required. This table indicates that the generator algorithm should be able to produce more than 10 12 random samples to avoid accidental repetition. The XRedDesign API employs the random number generator of L’Ecuyer with Bays-Durham shuffle, adapted from Press et al. (1997, p282), which generates uniformly random scalar deviates between zero and one. This algorithm has a long period that exceeds 10 18 samples, which ensures enough random values to carry out all of the algorithms in this work without repetition. These values are transformed into normal deviates using the Box-Muller transformation (Box and Muller, 1958), which transforms a pair of U(0,1) deviates into a pair of N(0,1) deviates using the transformation () () () () 11 2 21 2 2log cos 2 2log sin 2 X UU X UU π π =− =− . (5.03) U 1 and U 2 are uniform deviates between zero and one and X 1 and X 2 are normal deviates with mean zero and unity variance. Using random scalars from U(0,1) and N(0,1), one can generate random vectors from each of the three prior distributions incorporated by this work. To compute a random vector, X, from the multivariate normal distribution with mean vector μ and covariance matrix Σ, one must first generate a P-dimensional vector of independent standard normal deviates, Z, using the Box-Muller method. The independent samples can be linked through the covariance matrix and shifted toward the mean using the relation ( ) chol X Z μ =+Σ, (5.04) where chol(Σ) represents the Cholesky decomposition of the covariance matrix and 86 ( ) ~0,1 i ZN (5.05) for i = 1 … P. Since mathematicians commonly refer to the Cholesky decomposition as the “matrix square root”, one can see how this procedure is analogous to the transformation of N(0,1) samples to N(μ,σ 2 ), which involves multiplying by the standard deviation and adding the mean to a standard normal deviate. Computing lognormally distributed random vectors follows a similar methodology. Since a lognormal random variable is simply the exponential of a normally distributed random variable with mean, u, and covariance, S, as described by (2.08) and (2.09), one must only generate a normal variate according to (5.04) and take the exponential of it. In other words, ( ) () () ~, ~exp , LN XL NuS μ Σ . (5.06) If one stores the normal-space mean and covariance in memory for easy access, the generation of multiple repeated lognormal deviates is only slightly more complex than computing normal deviates. Generating random vectors from the multivariate uniform distribution is the simplest task of all. Since the multivariate uniform (rectangular) distribution restricts its covariance to diagonal matrices, the samples from each dimension are independent; one computes a P-variate uniform vector by drawing P random values from the univariate uniform distribution with mean and variance defined by the respective element of the multivariate mean vector and diagonal of the covariance matrix: ( ) () () , ~, ~, ~(0,1) Ui i ii ii ii i XU UA B B AU A μ Σ − + , (5.07) where A i and B i represent the lower and upper boundaries of the uniform distribution of the i th dimension, defined from the mean and covariance according to (2.11) and (2.12). Using these methods, one can easily convert a series of random uniform scalars into samples from any one of many multivariate probability distributions for use in global optimization, experiment simulation, and Markov chain generation. 87 5.3.4 Finding the Mode of a Markov Chain This work requires the MAP point estimate for a set of observations, which corresponds to the highest mode of the posterior distribution. When one can analytically define the posterior, finding the mode simply involves using an optimizer to find the maximum value of the pdf. However, this straightforward method will not suffice when using posterior simulation. The most intuitive method to find the posterior mode from a set of random samples is to generate its histogram, but this process exhibits several theoretical and computational deficiencies that make it impractical. First, the resolution of the histogram is dictated by the random sample size, since there must be enough samples to fill each bin adequately; oversampling the data destroys the features of the histogram. Expanding the histogram into multiple dimensions is difficult, since the number of required bins and samples increases exponentially with the dimension of the data. Finally, the histogram is discontinuous, which makes it useless for the precise determination of the mode of a data sample. Therefore, a better method of reconstructing a density from a random sample is required. This work uses the kernel density estimator (KDE) to estimate the posterior density from a Markov chain (Silverman, 1998), and finds the mode by maximizing the pdf estimate over α. KDE reconstructs a probability density from a set of random samples by summing of a group of smaller densities centered at each sample, called kernels, and then smoothing using a filter with some optimally defined window width (Figure 5.04). Since posterior simulation produces random samples from the posterior distribution, one can use KDE to reconstruct the posterior distribution from the Markov chain. The estimate of the posterior distribution at α generated from the posterior Markov chain, A, and kernel function K(.) is () 1 1 ˆ | L i P i A pz K Lh h α α = − ⎛⎞ = ⎜⎟ ⎝⎠ ∑ (5.08) for chain length L and window width h. The kernel is chosen as a unimodal and symmetric P-variate probability distribution whose integral over P-dimensional parameter space equals one; the Gaussian normal distribution is a common choice of kernel. If the spread of the data is dramatically different over the various dimensions, each dimension of the data requires its own smoothing window to obtain optimal smoothing of the density estimate, which complicates the computation considerably. 88 Figure 5.04: Posterior Reconstruction from a Markov Chain using KDE. By representing a posterior density as the sum of a series of “kernels” at each random sample (only the first five kernels are shown), an estimate of the posterior distribution can be computed that closely reflects its analytically determined value. One can find the mode of the posterior distribution by maximizing over the reconstructed density. Fukunaga (1972) proposes a novel approach to eliminate the need for multiple values of the window width by rescaling the random data and kernels. This method works by transforming the data to have unit covariance, smoothing with a radially symmetric kernel, and then transforming back. Using this method, the density estimate in (5.08) becomes () () () () () 1 2 1 11 ˆ|exp 2 2det L T ii P P i pz A S A h Lh S ααα π − = ⎧−⎫ ⎛⎞ =−− ⎨ ⎬ ⎜⎟ ⎝⎠ ⎩⎭ ∑ . (5.09) For the Gaussian kernel, S corresponds to the covariance matrix of the Markov chain, A. This causes the kernel to reflect the shape of the data and provides the best possible fit between the actual posterior density and the estimate. One determines the optimal value for the smoothing window width, h, based on the 89 minimization of the mean-squared error of the density estimate for the radially symmetric kernel. For the Gaussian kernel, the optimal window width becomes ( ) ( ) 14 14 41 2 PP h PL + + ⎛⎞ ⎛⎞ = ⎜⎟ ⎜⎟ + ⎝⎠ ⎝⎠ . (5.10) By using this value for the smoothing of the density estimate, the researcher ensures that the data will not be filtered too harshly, which can lead to loss of density features such as modes, nor under filtered, which can cause artifacts and discontinuities in the curvature of the estimate. Once the estimate of the posterior density has been determined, computing the mode is a relatively simple task; one must find the parameter vector, ˆ α , that maximizes the posterior density estimate over the parameter space. The mode-finding algorithm employed in this work uses the Nelder-Mead simplex method to maximize over the parameter space using the posterior density estimate as the criterion function. The optimization algorithm returns a mode of the posterior density as the optimum. Because the simplex method is a local optimizer, there is a chance that the algorithm will fail to find the tallest posterior mode. However, this does not pose a problem since it is not important to find the tallest mode of the density; the existence of multiple modes coincides with a dramatic reduction in posterior precision and any posterior distribution that can satisfy the proposed precision criterion will only contain a single mode. Consequently, the speed of convergence for the simplex method far outweighs any inconvenience caused by computing suboptimal modes, making it the ideal optimizer choice at this stage. 5.4 Software Validation To ensure that the XRedDesign API provides accurate results and is free of programming errors, it was subjected to a validation routine that examines each level of programming and compares it to both analytically verifiable and previously published results. Table 5.03 summarizes the validation process for the software. This procedure consists of seven stages, each of which assesses the integrity of a different class of operators and consists of a number of design and analysis tasks that explore the functionality of the 90 corresponding algorithms. Appendix B contains the complete results for all stages of validation, where the “control” value for each test is printed in black ink and those results computed by the XRedDesign API are printed in red; in every case, these values are equivalent. What follows in this section is a description the validation process and the various design tasks required at each stage. The first stage of validation focuses on the ability to generate random samples and evaluate pdf values from a given prior distribution for each of the three supported densities. This involves twelve trials: four repetitions for each of the normal, lognormal, and uniform prior distributions using three different values for the mean and covariance. The first task involves evaluating the pdf for twelve trials and comparing the results provided by the program to evaluations of the pdf computed analytically and verified in MATLAB. The second task involves drawing a large random sample consisting of 500,000 draws from each of the twelve prior distributions and computing the mean and covariance of the sample. At each trial, the sample mean and covariance agree with the statistics of the generating distribution. In addition, the histogram was computed in MATLAB for each of the univariate and bivariate priors, adjusted to represent the probability density at each node, and displayed against the analytically determined pdf surface. The validation routine confirms the accuracy of the random number generator by confirming agreement between the adjusted histogram of the random samples and the pdf surface. Stage Target Objective Trials 1 Prior and Random Gen random generation, pdf, statistics 36 2 Information Criteria eval the design criteria, validate CMesh 16 3 Global Optimization % correct over six criteria functions 600 4 Literary Comparison compare optimal designs in literature 4 5 Markov Chains generate chains, compute statistics 10 6 Parameter Estimation compare estimator results to MATLAB 15 7 Chain Integration integrate chains over elliptical ROI's 2 Table 5.03: Validation Stages for the XRedDesign API Algorithms. To ensure the accuracy of the results of this work, the XRedDesign API was subjected to a rigorous seven-stage validation routine that examined each facet of experimentation individually. Multiple trials were performed to guarantee the robustness of the results. Appendix B contains a more detailed treatment of the validation procedure for this work, including the results of the individual trials. 91 Stage two of the validation procedure focuses on confirming that the criterion functions involved with the optimal design provide an accurate representation of the experimental information. This includes the CInfoCriterion base class and the derived classes for D-, ED-, and EID-optimality. The first task involves the evaluation of the different criteria at single design points for various prior distributions and experimental models. One can validate some of the trials involving the D criterion analytically; the others are evaluated in MATLAB using analytically defined derivatives and either Monte Carlo integration over 25,000 evaluations or the weighted mesh, depending on the number of parameters, to handle the expectation. This stage validates the information criterion functions by proving that the results generated by the program agree with those provided analytically and using MATLAB. The next stage of validation ensures that the optimization algorithm used to find the optimal design accurately finds the global solution by employing a series of objective functions commonly used in the global optimization literature. These objective functions support multidimensional optimization in a single dimension; to optimize over an N-by-M design matrix, a dummy information function was constructed that computes the mean value of the objective function at each 1-by-M design point over N iterations. The validation routine computes the optimal design using the random creep algorithm for each of the six global optimization criteria for one hundred trials started from randomly selected initial values. The validation routine does not apply any tuning to the algorithm between the different criteria, instead using a single set of tuning parameters that is robust across any possible design criterion. The optimization accuracy is measured for each criterion function by the fraction of trials that find the global optimum. In five of the six cases, the optimization succeeds more than ninety percent of the time; the sixth case shows optimization accuracy of seventy-six percent. After a second pass through the optimizer, the accuracy rises to above ninety-five percent. This result shows that the optimization algorithm employed by the software will find the globally optimal design for a wide range of information criteria and models. The fourth stage of the validation routine involves computing the optimal designs for a variety of case studies taken from the literature, and validating based on agreement between the software and the published 92 results. The initial task involves computing the D- and EID-optimal designs for the single-parameter exponential decay model using a different inputs and prior distributions. These results are easily validated by analytically computing the optimal design for each of these cases. The second task involves replicating the optimal designs published by Box and Lucas (1959) for the two-parameter difference of exponentials, Bezeau and Endreyni (1986) for the three-parameter Hill, and Pronzato and Walter (1985) for the two- parameter exponential decay. In each of these works, the authors published optimal designs for these models for a given prior distribution. The optimal designs computed by the software for each of these experiments agrees with the results published in each of these papers, proving that the XRedDesign program produces accurate optimal designs. The fifth stage of validation shifts the focus from optimal design to Markov chain generation and analysis. For a series of ten experiments consisting of different models, prior distributions, and data vectors, the validation routine generates the Markov chain using the IMH algorithm. Then, the histogram was determined for each univariate and bivariate Markov chain, adjusted to correlate to the discrete pdf, and compared to the analytically determined posterior distribution. All ten trials illustrate a high correlation between the histogram of the Markov chain and the analytically determined posterior distribution; this indicates that the IMH algorithm implemented in the XRedDesign API accurately simulates random samples from the posterior distribution. After validating the IMH algorithm for each trial, the validation routine examines the functions that perform statistical analysis on the Markov chain. The program computes the measures of central tendency (mean, mode, and median), percentiles (15 th , 45 th , and 85 th ), and chain diagnostics (Autocorrelation and MPSRF) for each Markov chain. To generate a set of “control” values, one can easily compute the mean, median, percentile, and auto-correlation of the random sequence using the statistics toolbox in MATLAB. To compute the mode, the validation routine uses the MATLAB fminsearch function to maximize an analytically defined posterior distribution. Finally, the control value for the MPSRF is computed in MATLAB using the mpsrf function provided in the Markov chain diagnostics toolbox released by Sarkka and Vehtari (2004). This stage of validation shows that the XRedDesign API can precisely compute a Markov chain that closely reflects its target posterior distribution, 93 and that one can accurately compute all of the various statistics and diagnostics of the Markov chain that this work requires. The sixth stage of validation evaluates the parameter estimation algorithms in the CExperiment class. The routine verifies the estimateMLE and estimateMAP functions using twelve trials each from the exponential decay and difference of exponentials and six trials from the Hill sigmoid model. Each trial employs a different observation vector representing various degrees of experimental error. The XRedDesign software computes the parameter estimate for each trial using the MLE and MAP estimators and the result is compared to both the “true” parameter value that generated the data set and control values computed in MATLAB by optimizing a pair of criteria corresponding to the likelihood function and posterior distribution using the fminsearch function. For each trial, the parameter estimates computed by the XRedDesign program tend to correlate with the “true” parameter value with varying success; depending on the noise in the data, the estimate (especially MAP) sometimes drifts. However, one can discount any concerns this may generate, since the XRedDesign estimates agree with the MATLAB-generated control values very closely. This indicates that the estimation errors encountered are likely the result of the experimental framework (incorrect prior distribution, noisy data, et cetera), and not caused by the actual estimation algorithm. This stage of validation demonstrates that the parameter estimator functions provided by the XRedDesign API compute parameter estimates that at worst compare favorably with those determined by commercial programs, and at best are very close to the “true” parameter values. The final stage of validation focuses on the algorithms used to perform integration of a control region, which are critical for sample size determination. This validation consists of two trials that deal with a two- and three-dimensional parameter space. The validation routine begins by generating a Markov chain corresponding to a uniform distribution using the function growToPrior. In practice, the integration routine will run on a posterior Markov chain; however, for validating the integration algorithm, a Markov chain that reflects the prior distribution allows one to control its distribution more precisely and ensure equal distribution of samples over a given region in space. The idea behind this validation is that the uniform 94 samples in the Markov chain distribute evenly over a rectangular region bounded by the limits of the uniform distribution. If one superimposes a smaller region on this large rectangle, the fraction of samples that fall within it is equal to the ratio in area or volume between the control and large regions, which one can easily compute using geometry. The software creates objects corresponding to elliptical and rectangular control regions and integrates the uniform Markov chain over each of these, computing the integral of the chain over the control region exactly as done in the proposed SSD algorithm. The results validate the integration algorithm; the fraction of samples that fall within the control regions in each trial is always equal to the ratio of area between the regions. The positive results of the validation routine inspire a great deal of confidence in the algorithms of the XRedDesign API, particularly since the inability to separate the difference between software bugs and legitimate phenomena provides a major stumbling point of most numerical analysis. In particular, sample size determination and experiment design often provide information that the researcher might find troublesome or unintuitive. An algorithm may fail to find an optimal sample size or the optimal design may consist of a series of unintuitive measurements that might have serious implications and provide ample information about the experimental system. By rigorously validating the software, one can be assured any numerical phenomena have legitimate causes that warrant serious investigation and are not simply the result of software bugs. 95 Chapter 6 Evaluation of the Proposed Algorithm 6.1 Phases of Evaluation To illustrate the legitimacy of the proposed algorithm for precision-based sample size determination, this work evaluates it in action for a variety of purposes. Based on the challenges of biomedical experimentation and design objectives discussed in the first chapter, the proposed SSD method must exhibit the following requirements: • The algorithm must work as predicted. Estimates computed from data at the optimal sample size must exhibit the desired level of experimental precision, exhibited by the credibility interval. • The optimal sample size should correlate to the diminishing marginal utility of the relative error of the experiment; the error should no longer be rapidly decreasing as the sample size increases. • The decision criterion should account for the effects of different prior distributions and measurement error. The researcher should be able to infer experimental system behavior based on designs that fail to satisfy the SSD decision rules. • Additionally, the method must be invariant to the specific observation, z, and parameter vector, α, taken from the preposterior and prior distributions, respectively. It must also be able to estimate a large number of informative parameters. • The proposed method must be robust enough that it can be applied to a wide variety of nonlinear experimental models and work in an actual laboratory environment. It must blend seamlessly into the existing experimental procedure to reduce the redesign of these protocols. • Experimental results from the optimal sample size must be comparable to those already used in practice. The optimal sample size should be smaller than the naïve design for the same experiment. 96 Since the evaluation of some of these points requires a controlled environment and repeated trials and others need a practical laboratory environment, one cannot evaluate the algorithm for all of them using a single test. Therefore, this work employs two distinct phases of evaluation that focus on different classes of requirements. The two distinct phases of evaluation allow a rigorous examination of the proposed SSD algorithm. The first phase consists entirely of computer simulations, which provide a great deal of control over the experimental variables and allow the execution of multiple repetitions for each experiment for different scenarios. This allows this work to check the algorithm for invariance across α and z. In addition, computer simulation permits the manipulation of the Bayesian framework, allowing the precise definition of prior distributions and variance parameters. The second phase of evaluation provides an opportunity to demonstrate the application of the proposed SSD algorithm to actual experiments conducted by the faculty at the University of Southern California. This confirms its suitability for use in the laboratory setting and demonstrates how a researcher might apply the method in practice on a much wider scale. 6.2 Phase One: Computer Simulation 6.2.1 Method of Evaluation The initial stage of evaluation provides a proof-of-concept demonstration using simulations for three experimental scenarios commonly found in various areas of science and engineering. Computer simulation permits stricter control over the models, parameters, and measurement error than is possible with a real-life experiment, and allows one to isolate specific variables to study the behavior of the proposed SSD criterion. Performing experiments “in silico” also permits massive repetition of experimental trials and allows rigorous testing across the prior and preposterior distributions. This also provides the only means to evaluate the relative error of a parameter estimate directly, since one knows the “true” parameter vector used to generate a given set of observations. To study the behavior of the proposed method, test it for invariance across the prior and preposterior distributions, and determine the expected error for each sample size, this work takes advantage of the control and stable framework provided by computer simulation. 97 Exponential Decay Model Ex #μΣ CV % α 95% interval θσ 1:01 1:02 1:03 1:04 Rise and Fall Model Ex #μΣ CV % α 95% interval θσ 2:01 2:02 2:03 2:04 Hill Sigmoid Model Ex #μΣ CV % α 95% interval θσ 3:01 3:02 3:03 3:04 20.25 0.133 0.016 0.133 0.203 0.025 0.016 0.025 0.063 ⎛⎞ ⎜⎟ − ⎜⎟ ⎜⎟ − ⎝⎠ 20.25 0.133 0.016 0.133 0.203 0.025 0.016 0.025 0.063 ⎛⎞ ⎜⎟ − ⎜⎟ ⎜⎟ − ⎝⎠ () 0.123 0.044 0.013 0.013 0.010 − ⎛⎞ ⎜⎟ − ⎝⎠ () 0.123 () 0.490 () 0.50 () 0.50 () 0.50 () 0.50 () 0.490 0.044 0.013 0.013 0.010 − ⎛⎞ ⎜⎟ − ⎝⎠ 0.176 0.046 0.046 0.032 − ⎛⎞ ⎜⎟ − ⎝⎠ 0.176 0.046 0.046 0.032 − ⎛⎞ ⎜⎟ − ⎝⎠ 0.7 0.2 ⎛⎞ ⎜⎟ ⎝⎠ 0.7 0.2 ⎛⎞ ⎜⎟ ⎝⎠ 0.7 0.2 ⎛⎞ ⎜⎟ ⎝⎠ 0.7 0.2 ⎛⎞ ⎜⎟ ⎝⎠ 10.0 1.5 1.0 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 10.0 1.5 1.0 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 30 45 ⎛⎞ ⎜⎟ ⎝⎠ 30 45 ⎛⎞ ⎜⎟ ⎝⎠ 60 90 ⎛⎞ ⎜⎟ ⎝⎠ 60 90 ⎛⎞ ⎜⎟ ⎝⎠ () 70 () 70 () 140 () 140 () 10.0 () 10.0 () 10.0 () 10.0 () 0.1155 ~ 1.4505 () 0.1155 ~ 1.4505 () 0.0362 ~ 2.3343 () 0.0362 ~ 2.3343 () 0.06 () 0.02 0.05 () 0.02 0.05 () 0.02 0.05 () 0.02 0.05 () 0.02 0.05 () 0.02 0.05 () 0.05 0.10 () 0.05 0.10 () 0.05 0.10 () 0.05 0.10 () 0.05 0.10 () 0.05 0.10 () 0.50 () 0.50 3.8634 ~ 21.5252 0.7981 ~ 2.5859 0.5916 ~ 1.5901 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 45 30 25 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 45 30 25 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 3.8634 ~ 21.5252 0.7981 ~ 2.5859 0.5916 ~ 1.5901 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 0.3730 ~ 1.2054 0.0695 ~ 0.4601 ⎛⎞ ⎜⎟ ⎝⎠ 0.3730 ~ 1.2054 0.0695 ~ 0.4601 ⎛⎞ ⎜⎟ ⎝⎠ 0.1983 ~ 1.8182 0.0322 ~ 0.6907 ⎛⎞ ⎜⎟ ⎝⎠ 0.1983 ~ 1.8182 0.0322 ~ 0.6907 ⎛⎞ ⎜⎟ ⎝⎠ () 0.06 () 0.06 () 0.06 10.0 1.5 1.0 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 10.0 1.5 1.0 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ () 0.50 () 0.50 23 15 13 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 23 15 13 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 5.300 0.033 0.004 0.033 0.051 0.001 0.004 0.001 0.017 ⎛⎞ ⎜⎟ − ⎜⎟ ⎜⎟ − ⎝⎠ 5.300 0.033 0.004 0.033 0.051 0.001 0.004 0.001 0.017 ⎛⎞ ⎜⎟ − ⎜⎟ ⎜⎟ − ⎝⎠ 6.1858 ~ 15.352 1.0995 ~ 2.0011 0.7655 ~ 1.4623 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 6.1858 ~ 15.352 1.0995 ~ 2.0011 0.7655 ~ 1.4623 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ Likelihood: Gaussian Variance: Parabolic Table 6.01: Framework for Simulated Experiments for First Phase of Evaluation. This phase of evaluation involves twelve experimental trials that correspond to three experimental models. Each model uses combinations of low and high prior covariance and low and high experimental error. Each of the trials uses a lognormal prior distribution and the parabolic variance model. 98 The simulation phase of evaluation employs a set of twelve experiments that span three different experimental models. All of the experiments in this phase employ the parabolic variance model in (2.15) and the lognormal prior distribution. Each of the three models yields of four trials, corresponding to the combinations of and “low” and “high” prior covariance and “low” and “high” error variance; the twelve trials are numbered 1:01 to 3:04, where the number before the colon indicates the experimental model, and the number after the colon designates the individual trial for that model. Trial one makes use of low covariance and noise, trial two uses low covariance with high noise, trial three employs high covariance and low noise, and trial four utilizes high covariance and high noise. By isolating the prior covariance and experimental error, one can determine the effect of each of these variables on the determination of the optimal sample size. Table 6.01 summarizes the specific parameter values used in each of these trials. Each of the twelve experimental trials consists of three stages, exhibited in Figure 6.01. In the first stage, this work must determine optimal sample sizes for both the control- and risk-based decision rules for four increasingly large control regions. A series of EID-optimal designs are computed for each experiment for twenty different sample sizes based from the number of parameters, ranging from P to 30P. For example, a one-parameter experiment computes EID-optimal designs for samples sizes of 1, 2, …, up to 30 observations, a two-parameter experiment computes EID-optimal designs for sample sizes of 2, 4, …, up to 60 observations, and so on. These optimal designs maximize the information at their respective sample size and are the best possible design candidates for the SSD algorithm. Next, this stage determines the Markov chain lengths required to achieve a stationary distribution for the P, 10P, 20P, and 30P experiments using the MPSRF diagnostic covered in Section 4.4 for five thousand parallel Markov chains of length L. A constant burn-in of five thousand samples is used for each Markov chain. Finally, this work computes the expected control and risk at each of the twenty candidate designs for the elliptical control regions corresponding to ρ = 0.05, 0.10, 0.15, and 0.20. A series of curves that illustrate the evolution of the experimental precision and risk as the sample size increases are constructed. At each trial, eight optimal sample sizes are determined in all: one for each of the four control regions for each decision rule. 99 Evaluate for 5000 trials 20 trials each case Figure 6.01: Phase One of the Evaluation Procedure. The evaluation for each trial consists of three stages: (1) computing the optimal sample size for each of the control regions at each EID-optimal design, (2) determining the 90% credibility interval for each optimal sample size, and (3) constructing a standard error and information curve for each trial to examine the relationship between the optimal sample sizes and the expected accuracy of the experiment. Stages 2 and 3 confirm the design specification of the SSD criteria. The second stage of evaluation for each of the twelve first-phase trials involves ascertaining the precision of MAP parameter estimates computed at parameter values spread out over the prior distribution. For each experimental model, this work employs four parameter vectors taken from regions in the prior distribution with both high and low probability of occurrence. At each of the eight optimal sample sizes for each trial, twenty random data sets are simulated from the candidate design and the MAP estimate and 90% credibility interval are computed from the posterior Markov chain for each repetition; the average MAP point estimate and widest credibility interval over the set of repetitions is recorded for each N * . According to the design criterion, ninety percent of the posterior probability must fall within the fraction ρ of the mode. In other words, the 90% credibility interval should not span a range greater than 100ρ percent of the mode value. This compares (1) the size of the credibility intervals to the size of the control region at each 100 parameter value, and (2) the credibility intervals of all control regions across the different parameter values. This should validate the association between the control region and the resulting credibility interval and verify the invariance of the proposed SSD method to the actual parameter values and observed data. This clearly illustrates the utility and proves the dependability of the proposed SSD criterion. The final part of the first evaluation phase examines the proposed optimal sample sizes and their relation to the diminishing marginal utility of the information and estimator accuracy. The information provided by the design at each sample size is computed from the negative logarithm of the EID criterion at each sample size. This provides an increasing information curve that corresponds to the format seen in Figure 3.02. The accuracy of each sample size is determined by randomly selecting a parameter vector from the prior distribution and simulating an observation vector from the candidate design at the given sample size for five thousand trials. This work estimates the parameters using the MLE and MAP estimators based on this data set and computes the relative estimation error for the set of P parameter estimates according to () 0, 1 0, ˆ 1 ˆ P ii i i ERR P α α α α = − = ∑ , (6.01) where α 0 is the “true” parameter value and ˆ α represents the parameter estimate. The average relative estimation error over the whole of trials reflects the accuracy for the experiment at the given sample size and asymptotically approaches zero as the experiment becomes more accurate. This work displays the standard curves for information and accuracy and investigates the relation between the diminishing marginal utility and the locations of the optimal sample sizes; it is expected that the proposed SSD algorithm will recommend sample sizes at which the information and accuracy curves have begun to concave upward from their initial trajectories. 6.2.2 Experimental Models The first phase of evaluation employs three experimental models common to a wide variety of engineering problems, especially those found in biological and medical research. Since the intended purpose of the 101 proposed method is to design experimental protocols for actual experiments, ensuring that the models used for simulation are appropriate for realistic experimentation is critical. Table 6.02 contains the names and a brief list of applications of each model used in this phase of evaluation. These lists are not exhaustive by any means, but demonstrate some of the real-world applications of the mathematical models used in this work. This section discusses each of the experimental models employed by the simulation phase of this work. Experimental Model Applications Hill Sigmoid Exponential Rise and Fall Exponential Decay Overdamped Mass-Spring System Two-tank Mixing Problems Current in Series RLC Circuit Alcohol Metabolism Amt of ES-complex in Enzyme Rxn Stuart-Hamilton Method (determines cardiac output) Binding of Oxygen to Hemoglobin Enzyme Kinetics: Michaelis-Menten Briggs-Haldane Hormone-Receptor Interactions Effectiveness of a Drug Isometric Tension in Myocardium Binding of Ligands Protein Denaturation 1 α ( ) y t ( ) y t () 1 0 y θ = () () 1 1 00 0 y y α θ = = 1 α 2 α 2 α 1 α 3 α Radioactive Decay of Isotope Fick's Law: Passive Diffusion Newton's Law of Cooling Single-tank Mixing Problem Current in RC Circuit () 1 0 y θ = ( ) y t Table 6.02: Phase One Experimental Models and Their Applications. The three experimental models used in the first phase of evaluation were chosen because of their application to a wide range of scientific applications. This is because each of these models either solves a fundamental differential equation or, in the case of the sigmoid, describes a basic chemical process. While this table does not provide an exhaustive list of model applications, it illustrates the wide range of applications that one can use with the proposed SSD algorithm. 102 EXPONENTIAL DECAY The simplest experimental model used for in this evaluation is the exponential decay curve, defined as ( ) ( ) 11 ;, exp yxx αθθ α =−, (6.02) where the parameter α 1 represents the decay constant, which regulates how quickly the curve tends to zero from its starting value (Figure 6.02). Since θ 1 indicates the initial value of the function, this work regards it as a known constant. This model is commonly used to describe occurrences such as the radioactive decay of an isotope used in dating artifacts (Libby, 1955), as an approximation of heat transfer by Newton’s law of cooling (Wong, 2003), or to describe passive diffusion of a solute across a semi-permeable membrane (West, 1990). The wide utility of the model arises from the fact that it solves the commonly occurring first- order differential equation ( ) ( ) 1 ;, ; , yx y x αθα αθ ′ =−, (6.03) where the initial value of y is equal to θ 1 . This model has the advantage of being analytically friendly; a researcher can easily compute the derivatives and integrals for this model to check the program output. Its single parameter also provides an excellent foundation to test the utility of the proposed method of sample size determination before advancing to models that are more complicated and require more information to estimate the parameters. The wide range of applications and the computational simplicity of this one- parameter model make it an excellent starting example for this phase of evaluation. The EID-optimal designs for all of the trials that use this experimental model compute the expectation over the prior distribution according to (2.21) using a parameter mesh consisting of one thousand nodes. Sample sizes from one to thirty observations are considered. Computing the SSD criterion at each trial employs ten thousand (K = 10,000) parallel posterior predictive Markov chains, each with a length of L = 50,000 random samples. 103 Figure 6.02: Exponential Decay Response Range over Prior and Error Distributions. The upper pane displays the response range possible over the prior distributions in Table 6.01 (blue: smaller covariance, red: larger covariance), where the response at the mean parameter is indicated by a black line. The lower pane indicates the system response range at the mean parameter set under the influence of experimental error for the low (x markers) and high (+ markers) variance sets. The response range is largest at low inputs and decreases with the response magnitude due to the effect of the parabolic variance model. 104 EXPONENTIAL RISE AND FALL The next experimental model employed in this work is nicknamed the exponential rise and fall, contains two informative parameters, and corresponds to the mathematical expression () () () ( ) {} 1 21 11 2 ;, exp exp yxxx α αθ α α θα α =−−− − (6.04) The functional value begins at zero for small values of x, then rapidly rises to a peak and gradually returns to zero as x becomes large (Figure 6.03). Its informative parameters regulate the rapidity of the departure from and return to the steady state, while the theta parameter scales the height of the peak. Researchers use this experimental model to describe the motion of an overdamped spring-mass system (Huang, 1967) and the voltage and current in a series LRC circuit with real poles (Thomas and Rosa, 2003), in which it is the solution to the second-order initial value problem () ( ) ( ) ( ) 12 12 ;, ;, ;, yx y x y x αθαα αθαα αθ ′′ ′ =− + − (6.05) where the initial values of y and y’ are zero and α 1 /θ 1 , respectively. This model function also describes the amount of substance in a two-compartment mass-balance problem (Kreyszig, 1999), which solves the system of first-order differential equations 21 1 1 1 0 1 αθ α θ α −⎛⎞ ⎛⎞ ′ = ⎜⎟ ⎜⎟ − ⎝⎠⎝⎠ yy (6.06) with the initial condition of y equal to [0;1]. This format is particularly useful in chemistry and pharmacokinetics. In addition to its wide range of applications, this model is included in this work due to the work of Box and Lucas (1959), which used this model extensively during their work with D-optimality. Because this experimental model employs two parameters, the EID-optimal designs for all of the trials employ a parameter mesh consisting of 10,000 nodes to compute the expectation over the prior distribution. Sample sizes from two to sixty samples are considered. As with the exponential decay model, the SSD criterion at each trial employs ten thousand (K = 10,000) parallel posterior predictive Markov chains, each with a length of L = 50,000 random samples. 105 Figure 6.03: Exponential Rise and Fall Response Range over Prior and Error Distributions. The response for this model varies dramatically over the range of the prior distribution, which affects both the location of the peak and the rate of increase and decay of the response. Since this model has two parameters, each prior covariance produces four curves that display the responses from the combinations of the lower and upper limits for each parameter. The second plot shows the function of the parabolic variance model for both sets of variance parameters: the experimental error is at a maximum in the peak region of the response and smaller as the response decreases away from the peak. 106 HILL SIGMOID The third experimental model used in this work belongs to a class of curves called sigmoids, characterized by a step-like change between the low and high states at some critical value of input that takes the shape of the letter “S” (Figure 6.04). The most parameterized of the sigmoids corresponds to the equation ()( ) 3 3 2 11 1 2 ;, 1 x yx x α α α αθα θ θ α ⎛⎞ ⎜⎟ ⎝⎠ =−+ ⎛⎞ ⎛⎞ + ⎜⎟ ⎜⎟ ⎝⎠ ⎝⎠ , (6.07) which includes a baseline term, θ 1 , indicating the starting value of the function. The three alpha parameters indicate the maximum function value at equilibrium or steady state, the value that provides the half- maximum response (sometimes called the EC 50 ), and the “slope” or “steepness” parameter, respectively. Unlike the previous models, Equation (6.07) does not solve a differential equation; rather, Hill (1910) derived this curve to describe the binding of oxygen to hemoglobin. Researchers have since expanded it to describe a wide variety of interactions between ligands and receptors in chemistry and biology, especially those that involve cooperativity between binding sites on the receptor. Because of this, the model is often referred to as the Hill Sigmoid. For a system consisting of some receptor R, which binds up to n ligands, L, to form a ligand-receptor complex, RL n : n R nL RL +, (6.08) the dissociation constant of the LR complex is expressed as [][ ] [] n D n R L K RL = (6.09) and the fractional saturation of the receptor is defined as [] [][] n n RL Y RLR = + . (6.10) By combining these expressions, one sees that 107 [ ] [] 1 n D n D L K Y L K = ⎛⎞ + ⎜⎟ ⎜⎟ ⎝⎠ . (6.11) One can obtain an expression in the form of (6.07) by including additional parameters for the baseline and maximum responses and rearranging the equation. Note that the value of n need not be an integer; it represents the average number of ligands that bind to the receptor and not the maximum value. For example, Hill showed that n=2.8 for hemoglobin, even though each molecule has four binding sites. In addition to oxygen-hemoglobin binding, this model successfully describes the binding of hormones to their receptors (Cressie and Keightley, 1981), and the effectiveness of drugs in pharmacodynamic studies (Holford and Sheiner, 1981). Researchers have also used similar derivations of this model to describe the velocity of catalysis by enzyme-substrate complexes by both Michaelis-Menten and Briggs-Haldane mechanics (Price and Stevens, 1989). Since this model has three parameters, it is the most complex of the first-phase experimental models and illustrates the robustness of the optimal design technique. The EID-optimal designs computed for these trials compute the expectation over the prior using a parameter mesh with 15625 nodes. The SSD portion of the evaluation requires 25,000 parallel posterior predictive Markov chains, each with a length of 100,000 samples. Naturally, it requires the most observations to estimate its parameters precisely; sample sizes from three to ninety observations are considered. It is also the most computationally expensive, and provides a good idea of the time that a researcher might expect to invest when generating optimal designs and sample sizes in practice. Because of the wide range of input for this model, often covering several orders of magnitude, this work applies a logarithmic constraint to the input to hasten optimization. 108 Figure 6.04: Hill Sigmoid Response over Prior and Error Distributions. For this model, the variation of the second and third parameters affects the increasing region of the response by changing the rate and location of the step. As the prior covariance increases (red line), the response varies considerably in this region. The maximal response (the first parameter, corresponding to the maximum response) is relatively easy to estimate, but since experimental errors are maximized at high input values, replications at large input values are commonly necessary. 109 6.3 Phase Two: Practical Demonstration 6.3.1 Method of Evaluation The second phase of evaluation involves laboratory demonstration of the proposed SSD algorithm. This illustrates some of the practical aspects of the algorithm and shows how a researcher might implement the reduced designs seamlessly into their own work without modifying experiments. Practical implementation also eliminates any lingering doubts from the first phase regarding the use of simulated experiments, which rely on assumptions that might not be true in an actual experiment; in practice, the measurement error might not synchronize with the variance model or the experimental model might not approximate the natural system as precisely as in simulation. Finally, this phase permits the side-by-side comparison between experiments that use reduced sample sizes and their larger, naïvely-designed counterparts. Practical demonstration reveals the realistic utility of precision-based sample size reduction and allows the confident application of the proposed method in place of the typical naïve experiment design. Figure 6.05 displays the flowchart of the second phase of evaluation. This protocol focuses on three experiments, each of which includes a set of trials spanning different subjects, environmental variables, or control conditions with different parameter values to estimate. Since each trial employs the same reduced optimal design, they provide an expanded opportunity to demonstrate the robustness of the algorithm across different parameter values. The first experiment examines three different Sallen-key low pass filter circuits and uses the circuit gain to estimate the resistance values in the filter. Since the approximate resistance values are known from the build of the filter, this experiment allows some knowledge of the parameter values for verification purposes. The second and third experiments involve work performed by faculty members in the department of biomedical engineering at the University of Southern California. The investigators conducted these experiments using naïve designs and they have published their results in the literature. The reduced designs should provide a comparable result to the original experiments. Each of these experiments illustrates the seamless implementation of the proposed method of sample size determination into realistic experimental scenarios. 110 Figure 6.05: Phase Two of the Evaluation Procedure. The second-phase evaluation consists of two parts. The first phase computes the optimal designs and sample sizes for each experiment. Only one optimal sample size per control region is employed. The second part employs laboratory data to estimate the parameters and determine the precision of the experiment and examines the possible sample size reduction that the proposed SSD algorithm might allow. The first part of the evaluation is similar to the initial procedure described in Section 6.2.1. Once the prior distribution and variance model and parameters have been determined, this work computes the series of EID-optimal designs for sample sizes in increments of P, ranging from the minimum size (P) to the size of the naïve design, N + . Then, the stationary chain length is determined using the MPSRF procedure explained in Section 4.4 for five thousand parallel Markov chains. Next, the procedure computes the decision scores for the expected precision and risk at each size using K posterior predictive Markov chains of some length L, discarding the first five thousand samples as the burn-in period. At each experiment, two or three optimal sample sizes are determined based on the combined results of the two decision criteria; a design is chosen for each convergent control region using the risk decision criterion if it satisfies the rule in (4.36), and using the expected precision in (4.33) otherwise. 111 Finally, after determining the optimal sample sizes for each experiment, new data must be collected for each trial using the reduced experiment designs. While collecting new data at the reduced designs by re- conducting the experiment is preferable, the costs of experimentation and resources required limit the ability of this work to do so. Therefore, this phase generates data for each of the reduced sample sizes by either conducting a new experiment at the reduced design or by sampling at the reduced design by linearly extrapolating between the data points of the naïve experiment. This work uses this data to estimate the parameters and determine the 95% credibility or confidence intervals for both the reduced designs and the original, naïve design according to the original experimental procedure. Since one does not know the actual parameter values of a practical system, this work compares the “goodness” of the estimates produced by competing designs using the precision of the parameter estimates, which determine how well a given set of parameters fits the experimental model. This work shows that one can confidently replace a naïve design with the reduced design at the optimal sample size when their precisions and point estimates are equivalent. 6.3.2 Experimental Models SALLEN-KEY LOW PASS FILTER In medical electronic devices, the ability to predict impending failure and diagnose fatal flaws in hardware directly affects patient safety and well-being. One can estimate the parameters of a piece of hardware to determine the state of the system, which can assist with failure prediction and prognosis or aid in the diagnosis of the cause of a failure (Khosla, 2007). For example, a shift in the parameters from a stable configuration to the unstable pattern might indicate an imminent failure. Current early-warning systems employ a large number of sensors and are limited by the ability to process large data streams in real time. By reducing the number of data samples and sensors that a prognosis requires, one can achieve faster hardware analysis because of the reduced data load without compromising reliability and precision. In a diagnostic scenario, the examination of a limited number of specific sensor records can reduce the time required to determine the cause of failure. The ability to determine the state of hardware based on its parameters is an extremely important part of failure prediction, and a reduced optimal design can ensure that one determines these values as quickly and efficiently as possible. 112 Figure 6.06: Schematic for a Sallen-Key Low Pass Filter. The circuit employs an LM741 operational amplifier with feedback to provide a low pass characteristic. The input resistors and capacitors determine the cutoff frequency of the filter, while the output resistances adjust the gain of the circuit. For the purpose of this work, both the filter and gain resistances are equal, as are the capacitance values. The circuit requires a pair of input voltages of positive and negative fifteen volts to power the amplifier. The first practical demonstration employs an experiment to determine the resistance values of a Sallen-Key low pass filter circuit, a commonly used active filter with well-understood properties. This circuit has two main sections: a low pass part before the operational amplifier (LM741) and a portion to adjust the filter gain at the amp output, as depicted in the schematic in Figure 6.06. The filter usually has four resistance values; however, in order to provide a unique mapping between a given filter response and its parameters, this work reduces the number of unknowns to two by replicating the resistor values between the filter and gain segments. While this experiment presents a simplified case of a hardware system, it shows that one can use the proposed SSD method to reduce the number of sensors required to precisely prognose or diagnose a hardware system. Using the Kirchoff voltage and current laws for the circuit in the frequency domain, one can prove that the frequency response, H(f), of this circuit corresponds to () ( ) () 2 12 0 22 2 0 1 44 RR Hf ffj ω π πζ ω + = −+ + (6.12) 113 where value of the input signal frequency, f, is defined in Hertz (periods per second). Additionally, the fundamental frequency of the filter is defined as () 1 2 01212 RR CC ω − = (6.13) and 1 2 11 1 2 2 2 11 22 2 R RCRC RC ζ=+ + (6.14) (Lathi, 1992). From (6.12), one can compute the magnitude of the frequency response of the circuit as () ( ) ()() 2 12 0 2 2 222 0 1 44 RR Hf ff ω ωππζ + = −+ , (6.15) which predicts the amplitude ratio between the input and output of the filter as a function of the input frequency, also called the filter gain. At zero frequency, the gain is equal to 1+R 1 /R 2 and tends to zero as the signal frequency increases (Figure 6.07). Depending on the resistances in the circuit, the response of this filter can change dramatically, and instability can occur near the critical frequency when R 1 exceeds R 2 . This experiment involves applying a stationary, sinusoidal input signal of known voltage and frequency at the input node, V in , and using a two-channel oscilloscope to measure either the amplitude or the RMS voltage of the output sinusoid at the terminus, V out . One records the values of the input and output amplitudes or RMS values over a series of input frequencies and determines the circuit gain at the given frequency by computing the ratio of V out to V in . Since the response becomes very small at high frequencies and the operational amplifier often saturates whenever the theoretical filter output exceeds the V CC value of fifteen volts, the researcher must constantly increase or reduce the input signal voltage to continue to determine the filter gain accurately. 114 Figure 6.07: Sallen-Key Low Pass Filter Response Range over Prior and Error Distributions. The red lines in the upper pane indicate the response range for the combinations of the upper and lower 95% credibility intervals for the prior distribution. The response of the filter circuit is similar to the Hill Sigmoid function. However, this model exhibits a peculiar trait in the form of a region of instability at certain rare resistance combinations. The variance model ensures that the experimental error is highest at the plateau value. 115 Table 6.03 summarizes the prior knowledge of the experiment, where the two resistor values are the parameters of interest and the researcher knows the capacitance values with certainty. Since the resistances in the circuit must take on positive values, this experiment employs the lognormal density as the prior distribution. The mean vector and covariance matrix were chosen such that R 2 is generally larger than R 1 with minor overlap to cover the region of instability. The 95% credibility interval of the prior distribution covers many of the common resistance values found in hardware. This experiment employs the parabolic variance model with ten percent measurement error. This type of error is consistent with the measurements of the oscilloscope, which exhibits a fixed number of significant digits. Therefore, it is safe to assume that the instrumentation error of each measurement is correlated to its magnitude and not constant across the observation space. The oscilloscope produces very precise measurements; the primary source of experimental error results from the determination of the gain, which divides two measurements and multiplies the individual errors from the input and output measurements. The selected variance parameter vector adequately covers the effect of this error on the data. Fixed Parameters Notation Name Description Value Feedback capacitor value in μF 10.0 Pull-down capacitor value in μF 10.0 Informative Parameters Notation Name Description Filter and gain resistor values in kΩ Filter and gain resistor values in kΩ 1 θ 2 θ 1 C 2 C 1 α 2 α 1 R 2 R Prior Distribution: Lognormal μΣ CV % 95% Credibility Interval 20.0 45.0 ⎛⎞ ⎜⎟ ⎝⎠ 40.96 0.000 0.000 107.5 ⎛⎞ ⎜⎟ ⎝⎠ 40 25 ⎛⎞ ⎜⎟ ⎝⎠ 6.8750 ~ 32.101 25.070 ~ 66.323 ⎛⎞ ⎜⎟ ⎝⎠ Likelihood: Gaussian Variance: Parabolic, σ = (0.02, 0.05) Table 6.03: Framework for Sallen-Key Low Pass Filter Experiment. This experiment has two informative parameters, corresponding to the resistance values, and two fixed parameters that correspond to the capacitance values in the low pass circuit. The covariance maximizes the range of resistance values while minimizing the overlap between R 1 and R 2 , which induces a region of instability in the response. 116 Figure 6.08: Low Pass Filter Circuits Used in This Experiment. The trials in this work employ three different filter circuits with different resistance values, pictured here. The operational amplifier in each filter requires a pair of opposing fifteen-volt inputs applied at the green dots at top of the circuit. The input and output of the circuit is marked as a pair of blue dots on the left and right side of the board, respectively. The input of the each circuit is attached to a sinusoidal function generator and a two-channel oscilloscope monitors both the input and output of the circuit. The gain of the circuit is expressed as the ratio of the output to input voltages. This experiment uses three electronic circuits built for this purpose, pictured in Figure 6.08. Each has different resistance values that fall within the 95% credibility interval of the prior distribution; they all share identical capacitance values. Six separate trials for this experiment are conducted by measuring the response of each circuit twice at each frequency: independently computing the gain from both the amplitude and RMS voltage (Table 6.04). The original experiment employs a logarithmically spaced naïve design with thirty-four observations, ranging from 0.01 Hz to 100 Hz. At 100 Hz, all filters yield near-zero gain and estimated the parameters using the MLE estimator with a 95% confidence interval. This experiment computes the candidate designs using the EID criterion with a prior mesh of 15625 nodes. Because the model has two parameters, the maximum chain length is set to 50,000 and tested for stationarity using the MPSRF diagnostic over 5000 chains, which is consistent with the protocol used for the exponential rise and fall trials in the first phase of evaluation. Then, the SSD scores are computed for 117 each candidate design over ten thousand parallel posterior predictive Markov chains of length fifty thousand. From the results provided by both decision criteria, the optimal sample sizes are chosen for each of the four control regions. Then, the data from the proposed reduced designs is collected by re-conducting the experiment at the optimal design. To accommodate the minimum resolution of the function generator, the frequencies in the optimal designs are rounded to the nearest 0.01 Hz. Finally, this work estimates the parameters and 95% credibility interval using MAP for each trial and compares these values to those obtained from the original, naïvely-designed experiment. If the reduced and naïve designs compare favorably, their parameter estimates and credibility interval ranges should be similar. Circuit Voltage 12 3 Amplitude trial 1 trial 3 trial 5 RMS trial 2 trial 4 trial 6 Table 6.04: Experimental Trials for Sallen-Key Low Pass Filter. The experiment consists of six trials that employ three different circuits with different resistance values. For each circuit, the gain is computed from both the amplitude and the root-mean square measures of the input and output voltage. FLUORESCENCE OF INDOCYANINE GREEN IN BLOOD According to research, the cardiac output is a critical assessor of the medical condition of a patient with cardiovascular disease (Marik, 1999; Pinsky, 2002). One typically determines the cardiac output by injecting some type of indicator into the patient, either dye or cold glucose, and measuring the concentration of the indicator as a function of time. The relation between the dye concentration, c(t), at some time t, and the cardiac output, Q , is defined as () 1 0 1 t t Qctdt − ⎛⎞ =⎜⎟ ⎜⎟ ⎝⎠ ∫ (6.16) (Berne and Levy, 1997), where t 0 and t 1 are the starting and ending times of the measurement, respectively. This is a relatively invasive process, since one must insert a catheter into the subject to measure the dye concentration throughout the course of the trial. The fluorescent dye indocyanine green (ICG) provides a 118 less invasive method of measuring the cardiac output since it fluoresces with an intensity related to its concentration in the blood and can be monitored transcutaneously. Research has shown that this provides equivalent results to traditional dye dilution techniques (Maarek et al., 2004). However, ICG presents some difficulties to its wide use in clinical applications. In particular, the fluorescent intensity of the dye relates nonlinearly to its concentration in solution, requiring the generation of standard calibration curves prior to experimentation. More importantly, it is unstable in aqueous solution, which can lead to reduced light absorption, decreased fluorescence, and a shift in the wavelength of maximum absorption. The stability of ICG depends on a number of factors including the concentration of the dye, the nature of the solvent, and storage conditions. The second practical experiment for the evaluation of the proposed experiment design protocol reproduces the work of Maarek et al. (2001), who showed that adding sodium polyaspartate (PASP) to ICG in blood dramatically increases the stability of the dye. That work constructed a series of standard curves that relate the concentration of dye to its fluorescence in solution, which are an essential tool for computing the cardiac output from the trancutaneous fluorescence. Based on the experimental findings of various groups (Benson and Kues, 1978; Mordon et al., 1978; van den Biesen et al., 1995), this group devised a model to relate the experimental fluorescence, F(c), to the concentration of ICG, c, () ( ) ( ) ( ) { } 830 775 830 exp exp Fc B k c k k c = − −−+ . (6.17) According to this model, a region of maximal fluorescence occurs at some concentration and degrades as the parameter values change (Figure 6.09). The parameters k 775 and k 830 represent the products of the molar extinction coefficients with the mean optimal path length at 775 and 830 nanometers, respectively, and B describes the quantum yield of the fluorescence, which varies with the intensity of the excitation light. The original Maarek et al. experiment estimated each of these three parameters using the least-squares estimator for blood mixtures both with and without PASP added and over a range of wait times and found that the model parameters change predictably with the progression of time. Their experiment validated this model with an extraordinarily tight fit between the predicted response and the actual data observed. 119 Figure 6.09: ICG Fluorescence Response Range over Prior and Error Distributions. This experimental model shares a common shape with the Exponential Rise and Fall model used in the first phase of evaluation. However, the model equations are very different. The model indicates that ICG fluoresces with maximum intensity within a fixed concentration band, and that adjusting the parameters modulates the height of this peak. The selected prior distribution describes a wide range of fluorescence characteristics of ICG and covers the expected life cycle of ICG as it degrades. 120 The researchers conducted this experiment by preparing two batches of ICG stock solution in distilled water (one milligram per milliliter): one by itself and one treated with an excess of 2.6 molar PASP. Each batch of solution was “aged” for a length of time between zero and 336 hours to allow the ICG time to degrade. For each time measurement, the investigators made a series of quasi-logarithmic titrations of the stock solutions at nineteen concentrations between 0.0005 and 0.1271 molar, and added to 4 milliliters of fresh whole human blood. After mixing for ten seconds, the researchers collected data by stimulating the solution with near-infrared pulses at 775 nm and monitoring the average intensity of the ICG fluorescence over a half-minute interval at each concentration. Each observation is a whole number corresponding to the voltage of the photometer response. One can find a more detailed description of their procedure, including part numbers and specifics of the apparatus, in the aforementioned published work of this group. Of particular interest in the original experiment is its already small size, consisting of only nineteen observations; further reduction of the sample size is likely a difficult task. However, the original experiment also does not sample above 0.1271 molar, which neglects much of the information-rich tail of the response. Therefore, despite the small value of N + , a moderate degree of sample size reduction should still be possible for this experiment. This work examines six trials from the original experiment, corresponding to batches of ICG with and without PASP added for 10, 21, and 76 hours (Table 6.05). Incubation Time Chemistry 10 hours 21 hours 76 hours ICG alone trial 1 trial 3 trial 5 ICG + PASP trial 2 trial 4 trial 6 Figure 6.05: Experimental Trials for ICG Fluorescence. This experiment consists of six trials that measure the fluorescence of ICG at three different incubation times. For each timeframe, batches containing ICG alone and ICG treated with PASP were analyzed. This demonstrates how the addition of PASP stabilizes the ICG. The Bayesian framework of this experiment employs an overdispersed lognormal prior distribution whose mean and covariance were determined such that the 95% confidence intervals of the estimates for all of the trials in the original experiment fall within two standard deviations of the mean in normal space (Table 121 6.06). This yields a relatively uninformative prior distribution over the three informative parameters. Based on the residuals from the original Maarek et al. data, this work employs the parabolic variance model to describe the experimental error. Although at first glance, it may appear that the residual variance is constant, one can see that the data and model predictions agree less closely at the top of the peak and near the function tails. In general, the data fits the model very closely; however, there are notable exceptions in which the data and model prediction do not agree. As with other experiments, the ten percent variance parameter is selected to safely describe the residual variance without running the risk of overestimating it. Fixed Parameters Notation Name Description Value no fixed parameters Informative Parameters Notation Name Description Scaling Factor product of ex tinction coeff and mean pathlength at 775 nm product of ex tinction coeff and mean pathlength at 830 nm 1 α 2 α 3 α B 775 k 830 k Prior Distribution: Lognormal μΣ CV % 95% Credibility Interval 1382 100.5 15.10 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 4.416 5 0.000 0.000 0.000 307.9 0.000 0.000 0.000 9.060 E +⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 48 18 20 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 500.27 ~ 3100.8 70.005 ~ 140.01 9.9783 ~ 21.977 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ Likelihood: Gaussian Variance: Parabolic σ = (0.02 0.05) Table 6.06: Framework for ICG Fluorescence Experiment. This experiment has three informative parameters that describe the fluorescence of ICG at various concentrations at a given time. The model has no fixed parameters. The mean and covariance of the prior distribution were selected by examining data from previous experiments and selecting values such that all of the parameter values fall within its 95% credibility interval. This experiment begins by conducting EID-optimal designs using a mesh with 15625 nodes for sample sizes ranging from three to eighteen observations. The MPSRF is computed for the experiment using five thousand parallel chains for a maximum length of 100,000 samples. Then, this work computes the SSD scores for each criterion using K = 25,000 parallel chains of length L = 100,000. After computing the SSD decision scores for the experiment, this work computes new data for each experimental trial using linear 122 interpolation from the original experimental data between the two samples nearest to the design point and rounding to the nearest whole number to mimic the original data. Since the original experiment includes a control measurement at zero concentration, data values for inputs below 0.0005 molar are determined by interpolating between the data at 0.0005 molar and zero. Collecting data for concentrations above 0.1271 poses a more significant challenge; in this case, the three parameters are estimated by least-squares from the naïve data, a model prediction generated from these parameters, ten percent noise is added, and the value is rounded to the nearest whole number. By inspection, these synthetic data points appear to correspond to those obtained experimentally. Finally, this work uses the interpolated data to compute the LS estimate and 95% confidence intervals for each trial, and compares the results from the reduced optimal designs to those obtained from the original naïvely-designed experiment. DETERMINATION OF ANAEROBIC THRESHOLD Anaerobic threshold is an excellent assessor of the fitness of an individual, for measuring the progress of a training regimen, or for predicting athletic performance (Carey et al., 2005). The anaerobic threshold (AT) is the exercise intensity at which lactate begins to accumulate in the blood stream, which occurs when it is produced faster than it can be metabolized. The standard method of determining the anaerobic threshold of an individual involves the collection of blood samples during exercise over a range of increasing intensities. However, this technique is relatively invasive and painful, and analysis of the blood requires laboratory resources that prevent the real-time calculation of the AT. However, research has shown that the ventilatory threshold corresponds to the maximal lactate steady state (Yamamoto et al., 1991), and that one can determine the AT by measuring the ventilation of a subject at various exercise intensities instead of sampling blood. This measurement can be measured noninvasively and in real-time, allowing for much quicker determination of the subject’s anaerobic threshold. However, the data collected using the method tends to be much noisier than that from the blood. Reducing the sample sizes of these experiments can allow a researcher to run the test at each value of intensity for a longer time, which stabilizes the measurement, while simultaneously imposing a lighter overall workload on the subject. 123 Figure 6.10: Anaerobic Threshold Response Range over Prior and Error Distributions. Because this model consists of a pair of straight lines whose slope and intercept can take on a wide range of values, as indicated by the haphazard pattern of red lines in the upper pane, an additional constraint is applied that requires the lines to intersect within a range of possible AT values (bounded by the green lines). Because this experiment uses a constant variance, the width of the response range remains fixed over the input domain. 124 The final practical experiment that this work examines data collected by Yamashiro (2006) using the noninvasive ventilation method to determine the anaerobic thresholds of a group of subjects. No theoretical model exists to describe the relationship between exercise intensity and ventilation. Rather, researchers use a phenomenological model that consists of two nonparallel straight lines that intersect at the anaerobic threshold: () 00 11 mw b w AT VE w mw b w AT +≤ ⎧ = ⎨ +> ⎩ , (6.18) where m and b are parameters that dictate the slope and intercept of each line segment (Figure 6.10). These parameters can take on a wide range of values, but a limited number of specific parameter combinations intersect at an intensity value that is within the likely AT range. This experiment is only interested in obtaining an estimate for AT, which one can compute from the other parameters by algebraically solving for the intersection point of the two straight lines, defined as 10 01 bb AT mm − = − . (6.19) Aside from their use for computing the anaerobic threshold value, the other variables in the experiment have no research significance. The original Yamashiro experiment used a respirometer with a one-way valve to measure the volume flow of the subject’s exhales continuously over the course of time. First, the experiment calibrated the respirometer using a one-liter volume so that the flow volume can be adequately measured. The subject stood on a treadmill and the exercise intensity was increased by fifteen watts at every fifteen minutes, from an initial value of fifteen watts to three hundred watts (or until the subject quit from exhaustion). One must compute the ventilation from the time recordings of exhale flow for each level of intensity by integrating the flow over the time (Figure 6.11). Then, the ventilation is plotted as a function of the intensity. The researcher can compute the AT by breaking the data into two best-fit straight-line segments and finding the value of exercise intensity that corresponds to their intersection. The original experiment carried out this procedure for six trials using different subjects. 125 Figure 6.11: Collection of Ventilation Data for the Determination of Anaerobic Threshold. The data for ventilation is collected from a respirometer with a valve that measures rectified airflow, which oscillates with the breathing of the subject. The volume of air inhaled per minute is computed by integrating the respirometer flow for each level of workout intensity (red area) and normalizing by the time course. Because of the inherent difficulties involved with casting this experiment into a Bayesian framework, this work carefully studied prior ventilation data collected from twelve human subjects. Table 6.07 lists the parameter values that the Bayesian framework uses for this experiment. Although the researcher is only interested in the value of the AT, this is a secondary parameter and requires an estimate of the slope and intercept parameters for each subject. To connect the slope and intercept parameters with the allowable values of AT, this work constrains the prior distribution to values whose lines intersect within a fixed range that corresponds to the possible AT interval. This work examined the ventilation data for twelve other individuals and determined the AT range does not appear to be concentrated at any particular value of intensity. Therefore, the AT is treated the uniform interval that covers the range between 90 and 135 watts, which includes each of the observed threshold values. The prior distribution for the slope and intercept 126 parameters is determined from inspection of the data from the same twelve subjects. Since these parameters can take on both positive and negative values, the prior is represented as a normal distribution whose mean and covariance are chosen so that the prior covers the parameter values from all of the test subjects. While computing a Markov chain, one must sample from the constrained prior distribution; the algorithm executes this by randomly selecting an α vector from the prior distribution and computing the intersection of the corresponding lines according to (6.19). It accepts this vector if its intersection falls between 90 and 135 watts, and otherwise rejects the vector; the process is repeated until a sample is accepted. Constrained Parameters Notation Name Description Constraint Anaerobic Threshold Informative Parameters Notation Name Description Pre-threshold slope Pre-threshold intercept Post-threshold slope Post-threshold intercept 1 α 2 α 3 α 4 α AT 1 m 1 b 2 m 2 b () 90, 135 1 β Prior Distribution: Normal μΣ CV % 95% Credibility Interval 0.0412 20.15 0.1746 5.686 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 77 16 26 144 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 0.001 0.000 0.000 0.000 0.000 10.98 0.000 0.000 0.000 0.000 0.002 0.000 0.000 0.000 0.000 66.80 ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ 0.0220 ~ 0.1044 13.523 ~ 26.777 0.0852 ~ 0.2640 10.660 ~ 22.0323 −⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ − ⎝⎠ Likelihood: Gaussian Variance: Constant σ = (0.9) Table 6.07: Framework for the Determination of Anaerobic Threshold. This experiment consists of four parameters, which correspond to the slope and intercept of the two line segments. The prior distribution for these parameters was determined from the data of a larger group of subjects and exhibited a great deal of variability. The AT is a secondary parameter derived from the intersection of the line segments; these subjects all exhibited AT values between 90 and 135 watts, which was used to define the constraint. The variance model and parameter value also reflects the behavior of the prior data. The experimental error for this particular model arises almost entirely from the unpredictable breathing patterns of the human subject. By inspection of the prior data, one can see that often a subject will often hyperventilate or breathe more shallowly as a response to changes in the exercise intensity. Unexpectedly, this behavior does not correlate to any particular degree of workout intensity and appears to be random. 127 Therefore, this experiment employs a constant variance model that applies an equal amount of noise to each measurement. In this case, the variance parameter is set to 0.90, which reflects the residual variance of the twelve prior subjects. The design for this experiment also poses particular challenges to the researcher. Since the design objective is to minimize the maximum workload on the subject, this work divides the space into three regions that correspond to exercise intensities below the prior, the prior, and above the prior. By treating each line segment as its own model and sampling exclusively in regions above and below the AT range, one can easily estimate the slope and intercept parameters for each linear segment. The candidates designs for the SSD procedure are constructed from two sets of linearly spaced observations, ranging from two to ten samples each, in the regions below and above the AT boundaries. The samples in the region below AT are spaced evenly between zero and sixty watts, while the samples above AT begin with 135 watts and increase by fifteen watts per sample. Together, this provides a series of nineteen potential designs with increasing sample sizes that range from four to twenty total samples, increasing by two samples at a time. This example proves the flexibility and robustness of the decision criterion and demonstrates how a researcher can use candidate designs other than a formally defined optimal design in the proposed SSD method. The SSD algorithm also requires some modification from the original version. Since this experiment estimates the anaerobic threshold of the ventilation data without concern for the rest of the parameters, the SSD decision rule requires a population of Markov chains that reflect AT and not the slope or intercept parameters. For this experiment, the SSD algorithm transforms the original Markov chain into a version that reflects the distribution of the anaerobic threshold. This is performed by converting each link in the original chain to the corresponding AT value according to (6.19) and applying the proposed SSD decision criteria to this new chain. In this way, the SSD criterion determines the optimal sample size based on only the precision of the anaerobic threshold. 128 Aside from the aforementioned adjustments, this experiment proceeds exactly as the others. The MPSRF diagnostic is computed for designs consisting of 4, 10, and 20 samples using five thousand chains of length L = 50,000. The SSD scores are computed for the AT parameter using K = 10,000 parallel chains of length 50,000. Because of the nature of the anaerobic threshold, it is impossible to re-conduct the actual experiment using the reduced designs for the same subjects. This is because the AT is in a constant state of change, and the results obtained from different trials on different days are not comparable, even if they are from the same individual. Therefore, data for the reduced designs must be computed from the original data; once the optimal sample size has been computed for this experiment, new data for each subject is determined by either using the same data from the original experiment or using linear interpolation between data points from the original experiment. From this new data, this work computes the anaerobic threshold and the 95% confidence interval for each subject using the least-squares estimator at each experimental trial. One can then compare the results from the reduced designs to those obtained from the full, naïve design from the original experiment. 129 Chapter 7 Results of Evaluation 7.1 Results from Simulated Experiments As discussed in Chapter 6, the first phase of the evaluation procedure involved a series of simulated trials designed to study the behavior of the proposed criterion for different experimental models under variations in the number of parameters, prior covariance, and experimental error. This section covers the results of these experiments by examining the trials from each experimental model individually. Each experimental trial consists of three distinct stages: design and diagnostics, sample size determination, and evaluation of the proposed sample size. First, a series of EID-optimal designs were determined for a range of candidate sample sizes and the MPSRF posterior diagnostic was computed at key sample sizes spanning the range in order to ensure stationary convergence of the Markov chains. This work applied the two SSD decision rules to each candidate design and determined the set of optimal sample sizes that correspond to four different control regions for each rule. Finally, the evaluation verified the precision of the parameter estimates for each of the optimal sample sizes and compared them against the diminishing marginal utility of the experimental information and accuracy. 7.1.1 Trials 1:01 through 1:04 – Exponential Decay The first block of experimental trials employed the exponential decay model, which has a single informative parameter that describes the steepness of the descent of the response. Because of the simplicity of this model, computation carried out very quickly on a personal computer, even for the largest sample sizes. First, this work computed the EID-optimal designs for sample sizes spanning the range of 1, 2, 3… 10, 12, 14… 30 samples. Figures 7.01 through 7.04 illustrate the EID-optimal designs at each trial for one, two, and four samples; one can see that as discussed in Section 2.3, the EID-optimal design has the distinct feature of a spread-point design with minimal replication. Since these experiments use the quadratic variance model, the error variance term, W, in the design criterion inflicts a penalty proportional to the 130 magnitude of the observation. As the variance increases, so does this penalty. Therefore, the designs that correspond to larger variance parameters shift their design points towards smaller values, striking a balance between the information provided by a measurement and the likelihood that the measurement accurately represents the system. The covariance of the prior distribution also affects the optimal designs; the sensitivity matrix, F, in (2.17), increases in regions that have great “potential” according to the prior distribution and influences the designs to gravitate towards observations that can elicit a wide range of response values based on the potential parameter values. In this case, the design chooses lower values for the cases that use the larger prior covariance where the prior uncertainty has a larger impact on the system response. Finally, the optimal designs remain relatively spread out over the input space with some replication as the sample size increases beyond sixteen observations. The second half of the design stage for the exponential decay trials involved ensuring that the Markov chain length of fifty thousand samples is adequate to reflect a stationary posterior representation for this experiment. This determination was performed using the MPSRF diagnostic discussed in Section 4.4, which converges to one as a Markov process achieves stationarity. Since this process is extremely expensive computationally, this work only applied this procedure to the designs for sample sizes equal to one, ten, twenty, and thirty observations; it was assumed that if convergence is achieved at a series of fixed intervals, it should be achieved between those intervals as well. Figures 7.05 through 7.08 illustrate the results of the posterior diagnostic for each of the four trials using a Markov chain length of fifty thousand. In every case, the reduction factor quickly tends to one and is convergent and stable at the maximum chain length, which shows that this Markov chain length is a good value to use for the SSD procedure. The rapid speed that the reduction factor converges is most likely the result of computing the posterior over a single parameter. At this stage, the prior covariance and variance of measurement error did not seem to have a tangible effect on the chain convergence, since the convergence across the trials is similar. An unexpected result of this stage is that the Markov process for a larger sample size converged more slowly than that of a smaller sample size. The cause of this phenomenon will be explored in Section 7.3. 131 Figure 7.01: EID-optimal Designs for Trial 1:01. Dotted lines indicate the response range over the prior distribution. The first design point falls at the region of maximal information, where the model is most sensitive to changes in the parameter. As the sample size increases, the points spread out over the design space. Figure 7.02: EID-optimal Designs for Trial 1:02. The design points roughly follow the same pattern as with the previous trial. Because of the increase in error variance, the design points shift to the right, to values of lower response (and error). The spread of the design points remains relatively unchanged. 132 Figure 7.03: EID-optimal Designs for Trial 1:03. Because of the increase in the response range due to the prior covariance, the design points have shifted to the left. The single-point design falls at the point of maximum sensitivity, which does not correlate to the point of maximum response range. Figure 7.04: EID-optimal Designs for Trial 1:04. As with trial 1:02, the increase in variance causes the design points to shift slightly to the right relative to the previous trial. Because the design points at the lower input values provide ample information, these design points only exhibit a small shift to the right. The design points at higher inputs tend to shift more under the increased experimental error. 133 Figure 7.05: MPSRF Posterior Diagnostic for Trial 1:01. These examples compute the reduction factor using five parallel, “overdispersed” chains over 5000 observation vectors. These plots represent the RMS of the 5000 reduction factors after discarding the largest one percent of values. Because this is an expensive process, the MPSRF is applied to four sample sizes across the range, rather than at each one individually. Figure 7.06: MPSRF Posterior Diagnostic for Trial 1:02. As seen in trial 1:01, the reduction factor approaches the value of one relatively quickly and is solidly convergent at fifty thousand samples. The larger designs converge more slowly than the smaller in a very predictable, sequential manner. 134 Figure 7.07: MPSRF Posterior Diagnostic for Trial 1:03. This trial requires more samples to achieve stationarity than the previous trials. It is interesting to note that the larger sample sizes clearly converge more slowly than the smaller sample sizes, even though the posterior distribution is more precise. Figure 7.08: MPSRF Posterior Diagnostic for Trial 1:04. As with the other trials, the Markov process converges within the designated length. Even at the largest sample size, the reduction factor converges to 1.0 within the chain length. A length of fifty thousand samples is sufficient for this model for all trials. 135 Figure 7.09: Sample Size Determination for Trial 1:01. The upper pane shows the expected control computed by the first decision criterion, while the lower panes show the risk as computed by the second decision criterion. The SSD process employs four different control regions of 20, 15, 10, and 5 percent of the mode, corresponding to increasing degrees of precision. The target values for each criterion are indicated by red dotted lines. The risk criterion uses a fixed control threshold of Q 0 = 0.90 for all cases. 136 Figure 7.10: Sample Size Determination for Trial 1:02. Increasing the variance of the experimental error affects the optimal sample sizes by driving them to higher values. This is especially apparent in the two plots of experimental risk, which show that the optimal sample sizes increase dramatically over the previous case. The error destabilizes the population of chains by increasing the variance of their control; this lack of consistency across the Markov chains is reflected in the observed risk values. 137 Figure 7.11: Sample Size Determination for Trial 1:03. This trial exhibits the effect of increasing the prior covariance, while keeping the experimental error relatively low. The expected control is remarkably similar to the results from the previous trial, which increased the error and maintained the covariance. However, the risk plots show that the increase to the covariance destabilizes the control more than the error. 138 Figure 7.12: Sample Size Determination for Trial 1:04. This case exhibits the difficulty encountered when dealing with an experiment with both high covariance and experimental error. The expected control increases more slowly for the precise control regions, while the expected control of the two most imprecise regions remains relatively unaffected. However, the optimal sample sizes increase dramatically under the risk-based criterion, indicating that the control varies dramatically between parallel chains. 139 Once this work computed the set of optimal designs for each trial, it determined the optimal sample sizes for each experiment. This process used both the expected precision criterion (4.33) and the risk criterion (4.36) for sample size determination using four increasing control regions. Figures 7.09 through 7.12 display the results of the SSD determination for both decision criteria at each of the four trials. The top pane illustrates the results from the expected precision criterion, which measures the fraction of the posterior chain within the control region as a function of the sample size. The various traces represent the estimator precision of elliptical control regions of increasing size according to (4.32); the first series (red) represents the region with a boundary defined by five percent of the mode value, while the fourth (blue) represents a twenty percent control boundary. As N grows, the expected precision increases and eventually converges to one, indicating that 100% of the posterior samples fall within the control region. According to this decision criterion, the optimal sample size occurs at the value of N that crosses Q 0 = 0.90. The lower pane displays the results of the risk-based decision criterion for p t ’ values of 0.10 and 0.05. The evolution of the risk as the sample size increases looks dramatically different from the expected precision curve. It begins at 100% risk, indicating that there is zero chance that the design satisfies the precision requirement. As the sample size increases and the experiments become more precise, the risk rapidly decreases to zero percent, which indicates that less than p t ’ of the experimental outcomes fail to satisfy the precision criterion. The optimal sample size according to the risk criterion is that which reduces the risk to below five percent (β 0 = 0.05) for the assumption that p t ’ of the chains are at least ninety percent within the control region. The first case (Figure 7.09) is the easiest to compute estimates from since it has the smallest prior covariance and measurement error. As one might expect, the first trial converges the most easily; even the most stringent control region with radii at five percent of the posterior mode reaches the Q 0 = 0.90 threshold after only five samples. The results of the risk criterion supports these findings, since its optimal sample sizes lag behind the expected precision criterion at twelve and eighteen samples for values of p t ’ at 0.10 and 0.05. The two middle trials illustrate how increasing either the prior covariance or the variance parameter affects the optimal sample size. When using the same prior distribution but assuming a larger error variance, the control curve increases more slowly and dictates larger optimal sample sizes. The curves 140 corresponding to smaller control regions are affected much more strongly than the traces from the larger control regions, which remain almost unchanged. For example, the curve corresponding to the twenty percent control region shows a sample size increase from one to two samples, but the five percent curve increases from five to eighteen samples. The optimal sizes determined by the risk criterion follow a similar, more dramatic pattern. The optimal sizes for the twenty and fifteen percent control regions increase by a few samples, but the size corresponding to the ten percent control regions increases dramatically to ten observations at p t ’ = 0.10, and the smallest region does not satisfy the design criterion using the allowable range of sample sizes. In the other case, where the prior covariance is increased without increasing the experimental error, the curve from the first criterion increases only slightly more slowly than in trial 1:02. The results generated by the risk criterion for this trial are very different, however; the optimal sample sizes are dramatically larger than those from the first two trials, approximately doubling the optimal sample sizes from trial 1:01. Finally, trial 1:04 represents the unfortunate case in which the experimenter is plagued with both high prior covariance and measurement error. The SSD scores in Figure 7.12 illustrate a slowed increase in the average control for all of the control regions with optimal sample sizes taking on values at 5, 8, and 16 samples. As large as these sample sizes have become, the risk criterion illustrates just how grave this experimental scenario actually is; only the largest two control regions can achieve the required degree of reliability, and that comes at very high sample sizes: 16 and 24 samples for the twenty and fifteen percent control regions, respectively. Table 7.01 states the optimal sample sizes for these four trials as determined by the both the control and risk criteria. Decision Rule 1: Control Decision Rule 2: Risk TRIAL 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 MAX 1:01 5221 12 52 2 30 1:02 18 5 3 2 --- 10 5 3 30 1:03 18 6 4 3 --- 18 9 5 30 1:04 --- 16 9 5 --- --- 24 16 30 Table 7.01: Optimal Sample Sizes for Trials Using the Exponential Decay Model. The table illustrates the optimal sample sizes for both proposed criteria at the four values of ρ. The control rule is satisfied when Q exceeds 0.90; the risk rule is satisfied when β falls below five percent for Q 0 = 0.90 and p t ’ = 0.10. 141 After computing the optimal sample sizes using the proposed decision rules, this work computed the MAP parameter estimates and 90% credibility intervals by simulating an experiment at each of the eight optimal sample sizes. This example determined the robustness of the estimates and credibility intervals across the parameter space by simulating at four “true” parameter values that span the prior distribution. For trials 1:01 and 1:02, case 1 corresponds to a parameter value of 0.218, located in the leading tail of the lognormal prior distribution; cases 2 and 3 correspond to 0.490 and 0.771, taken from the middle of the prior, near its mode. Finally, the fourth case corresponds to a parameter value of 1.451, which falls in the lagging tail of the prior distribution and has a higher pdf value than case 1. Trials 1:03 and 1:04 employ a wider covariance and use parameter values of 0.103, 0.291, 0.824, and 2.334 for the four cases, which share the same relative locations in the prior distribution as the previous values. Figures 7.13 through 7.16 illustrate the results of this evaluation for trials 1:01 through 1:04. The top and bottom panes correspond to the estimator precision provided by the optimal sample sizes obtained from the expected control and risk decision rules, respectively. The dashed black line indicates the “true” parameter value for each case. This evaluation illustrates a number of interesting behaviors of the optimal sample sizes. First, one can see that regardless of the sample size, the point estimates remain relatively accurate. Second, the sizes of the credibility intervals correspond to the ρ of their respective control regions. For example, for the control region corresponding to ρ = 0.20, the credibility intervals only exceed twenty percent of the magnitude of the estimate using the control decision rule in the two extreme cases, and are never wider than 20% using the risk decision rule. This result supports the objective of the SSD procedure for each of these decision rules; the control rule uses the expectation and can fizzle in regions of low prior probability, but the risk rule is designed to work universally. One can also see that the credibility intervals for the results generated by the risk criterion are generally smaller than the intervals generated by the control criterion due to the increased information required to ensure that one can precisely estimate parameters in the tails of the prior distribution. Finally, this evaluation shows that the relative size of the credibility intervals remains relatively similar across all four test cases, demonstrating robustness across the prior distribution. 142 Figure 7.13: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:01. These figures display the largest credibility interval encountered over twenty trials at each case. The optimal sample sizes for each control region and criterion are shown. The dots indicate the MAP point estimate, while the error bars indicate the range of the credibility interval. The numbers above each estimate indicate the length of the longer error bar relative to the MAP estimate, which should be less than the size of the control region. 143 Figure 7.14: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:02. As with the first case, the upper pane illustrate the credibility intervals for sample sizes determined by expected control, while the lower pane presents the results from sample sizes determined by the risk criterion. Again, the cases that correspond to the tails of the prior distribution exhibit less precision than the design specification for the expected control, but this behavior is balanced out by improved performance near the prior mean. 144 Figure 7.15: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:03. This trial exhibits behavior similar to the previous two trials. While the results provided by the expected control average to the control specification, those provided by the risk criterion are consistently precise. The additional samples required by the second criterion allow the researcher to precisely estimate the parameters from the tails of the prior distribution, in addition to those that fall in the center. 145 Figure 7.16: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 1:04. In spite of the high experimental error and wide prior covariance, the credibility intervals exhibited by this trial rigorously adhere to the precision requirement at each design. Again, cases where the true parameter falls in the tail of the prior distribution exhibit larger-than-expected credibility intervals for the sample sizes from expected control, but this is balanced by the precision of the case from the center of the prior. 146 Figure 7.17: Experimental Accuracy and Information for Trial 1:01. The optimal sample sizes determined by expected control are illustrated as broken vertical lines, while solid vertical lines indicate those determined by the risk criterion. The color of each vertical line indicates the precision specification, ρ. Figure 7.18: Experimental Accuracy and Information for Trial 1:02. The optimal sample sizes occur in two distinct regions: either immediately or after the “elbow” regions, depending on the initial accuracy. Unlike trial 1:01, the initial accuracy is low, causing more optimal sizes to fall in the region after the elbow. The most precise designs (orange solid and red dotted) occurs when the error is barely changing. 147 Figure 7.19: Experimental Accuracy and Information for Trial 1:03. The increase in prior covariance diminishes the accuracy of the estimates. In this case, the optimal sample sizes for the most precise control regions occur after the error curve has begun to taper off. Figure 7.20: Experimental Accuracy and Information for Trial 1:04. The combination of error variance and uncertainty reduces the accuracy. Because of the low initial accuracy, even the least precise designs occur after the elbow of the error curve. These designs fully capture the diminishing marginal utility of the experimental accuracy. 148 The second stage of evaluation employed a trio of standard curves for each trial to illustrate how the sample size affects each of these qualities. The information curves were computed as the negative log of the EID criterion at the optimal design, while the accuracy curves are represented as the average of the estimation error for the MLE and MAP estimators over 5000 observations simulated from random parameter values. For illustration purposes, a regression estimate was determined for each curve from the data and plotted alongside the data set; information curves fit a logarithmic regression, while the accuracy curves fit to a power regression. These fit the data very well, consistently exhibiting a coefficient of determination of greater than 0.98 in every case. The optimal sample sizes for each criterion and control region were labeled on these standard curves to determine the relationship between diminishing marginal utility of the sample size and the optima determined by the proposed method. Figures 7.17 through 7.20 illustrate the optimal sample sizes against the information and accuracy curves for the four trials. One can see that the information increases logarithmically and exhibits diminishing marginal utility as predicted by Lindley. Additionally, the estimation error decreases toward zero as the sample size increases and exhibits diminishing utility as well. The vertical solid lines in each plot indicates the optimal sample size determined by the risk criterion, while the vertical broken lines denote the optimal sample sizes from the expected control. The color of each line corresponds to the size of its control region, using the same color standard as the previous figures (red: ρ = 0.05, blue: ρ = 0.20, et cetera). One can make a couple of key observations from the evaluation presented in Figures 7.17 to 7.20. In particular, the optimal sample sizes fall in one of two regions on the standard curve. In instances where the smallest sample sizes yield reasonable accuracy, the optimal sample sizes for wider control regions occur early. This type of behavior is evident in the results of trials 1:01 and 1:02 (Figures 7.17 and 7.18). In these instances, designs with fewer observations provide adequate precision and do not need to capture the marginal utility of the information. Conversely, a different behavior occurs with trials 1:03 and 1:04 where the experiments yield less accurate results (Figures 7.19 and 7.20). In these cases, the optimal sample sizes occur in regions of the curve where the error has tapered off dramatically, and has diminished to near its final value. This capture of the point of diminishing marginal utility is the target behavior of the proposed SSD criterion. 149 7.1.2 Trials 2:01 through 2:04 – Exponential Rise and Fall The next four experimental trials employed the two-parameter exponential rise and fall model, which shares much of the simplicity of the previous model. The primary difference is the addition of an extra parameter, which increases the dimension of both the Markov chain and the control region. As was done with the first experimental model, these trials began with the computation of the EID-optimal design (Figures 7.21 through 7.24). As predicted by Box and Lucas (1959), the optimal design for this model places and equal number of design points on either side of the peak response. Similar to trials 1:01 through 1:04, increasing the variance of the measurement error produced designs that favor smaller observations, pushing them away from the maximum. In addition, increases in the prior covariance had a slight effect on the design points, especially as the sample size increases; the design points shift toward measurements that provide a large potential response range. One should also note that these optimal designs show a small degree of design point replication as the sample size increases. The MPSRF diagnostics for these four trials (Figures 7.25 through 7.28) demonstrate that the Markov chain length of fifty thousand samples is adequate to achieve a stationary posterior sample, since the reduction factor for each trial always converges to one. The SSD procedure generated good results for each of the combinations of prior covariance and measurement error variance. In trial 2:01, which exhibits the lowest prior covariance and residual variance, the algorithm finds optimal sample sizes for all four of the control regions for both the control- and risk- based criteria (Figure 7.29). As seen with the previous experimental model, the control curves increase very quickly and converge to one, indicating that 100% of the Markov chain is within the control region. When the variance of the experimental error was increased (Figure 7.30), the increase of the control curves dampened dramatically and the optimal sample sizes increased. As seen with the first four trials, the curves corresponding to the smaller control regions are affected more dramatically than the curves corresponding to the larger control regions; the optimal sample size for the twenty percent region goes from two to four samples, while the sample size for the five percent region moves from twelve to fifty-two samples! These changes are reflected in the optimal sizes computed by the risk criterion, where the ρ = 0.10 sample size increases from six to eighteen samples, and the ρ = 0.05 increases from fourteen samples to more than sixty. 150 Figure 7.21: EID-optimal Designs for Trial 2:01. Because of the effects of the model parameters and the preference of the variance models for lower responses, the optimal design points split to either side of the peak. Because the response increases very quickly, the points before the peak are tightly bunched together. Figure 7.22: EID-optimal Designs for Trial 2:02. By increasing the variance parameters, the optimal design points shift to lower values. In this instance, this shifts the design points away from the peak. The distance between the optimal design points is relatively unaffected by the increased error variance. 151 Figure 7.23: EID-optimal Designs for Trial 2:03. Increasing the prior covariance dramatically changes the response range of the model. The design points after the peak have spread out so far that some of them might actually fall on or before the peak. This ensures that all possible responses are represented in the design. Figure 7.24: EID-optimal Designs for Trial 2:04. These designs exhibit all of the behavior that one would expect. The design points spread out and shift toward lower response values because of the variance. In spite of the increased experimental error, the relative spread of the design points remains unchanged. 152 Figure 7.25: MPSRF Posterior Diagnostic for Trial 2:01. The reduction factor was computed for chain lengths up to fifty thousand samples for the optimal designs consisting of 2, 20, 40, and 60 observations. In this case, the Markov process for each design becomes stationary within the designated chain length. Figure 7.26: MPSRF Posterior Diagnostic for Trial 2:02. Again, the reduction factor easily converges to one over the maximum length of fifty thousand samples for each of the test designs. 153 Figure 7.27: MPSRF Posterior Diagnostic for Trial 2:03. Increasing the prior covariance provides some difficulty to the convergence of the reduction factor in this case for N=40 and 60. However, the reduction factor is no longer changing at fifty thousand samples, and can be considered convergent. Figure 7.28: MPSRF Posterior Diagnostic for Trial 2:04. Again, the reduction factor converges to one. Therefore, the maximum Markov chain length of fifty thousand samples is sufficient for this set of trials. 154 Figure 7.29: Sample Size Determination for Trial 2:01. Due to the low covariance and error variance, the optimal sample sizes occur at very low values. For example, a sample size of 18 observations satisfies the risk criterion for p t ’ = 0.10, which virtually ensures that the 90% credibility interval will be within 5% of the mode for each parameter. The sample sizes for the other control regions occur at even lower values. 155 Figure 7.30: Sample Size Determination for Trial 2:02. Again, increasing the experimental error affects the precise control regions much more strongly than the larger regions. This is readily apparent in the expected control plot, where the red trace increases very slowly relative to the others. Increasing the error variance strongly affects the risk; the optimal sample sizes for the larger control regions are only slightly larger than in the previous trial, but the ρ = 0.05 design no longer has an optimal size at either p t ’. 156 Figure 7.31: Sample Size Determination for Trial 2:03. For this experimental model, increasing the prior covariance does not affect the precision as strongly as the error variance from the previous trial. In fact, the optimal sample sizes computed in this trial are only slightly higher than those in trial 2:01. This is different from the single-parameter case in trial 1:03, and is likely due to the existence of multiple model parameters that must be estimated. 157 Figure 7.32: Sample Size Determination for Trial 2:04. As seen with the previous experimental model, the combination of high covariance and error variance raises optimal sample sizes and prevents consistent estimator precision. In particular, the control region corresponding to ρ = 0.05 fails to satisfy the precision requirement for the risk criterion and barely reaches the Q 0 = 0.90 threshold for the control criterion. The other control regions are not affected as strongly, but still increase their sample sizes. 158 Trial 2:03, which increased the prior covariance without increasing the measurement variance, illustrates a concrete difference between increasing the prior covariance and the measurement error. Unlike trial 2:02, which resulted in a dramatic increase in the optimal sample sizes, the control curves in this trial increased only slightly more slowly than those from the smaller prior covariance in trial 2:01. The optimal sizes computed by the risk criterion were affected slightly more strongly than the control curves. For example, e.g., the ρ = 0.05 design increases from fourteen to twenty samples at p t ’ = 0.10 between trials 2:01 and 2:03 (this criterion fails to converge at all in trial 2:02). The optimal sample sizes in this experiment were increased more because of experimental error than from prior uncertainty. The final trial combined the large covariance with increased variance of experimental error, and shows SSD responses that combine the difficulties seen in trials 2:02 and 2:03. The curve produced by the control criterion increases more slowly than any of the other trials, slightly worse than the trial 2:02 results. For example, the curve for the smallest control region increases very slowly, and does not break the threshold at Q 0 = 0.90 until 56 samples have been used as opposed to 52 samples in trial 2:02 and 12 samples in trial 2:01. The risk criterion shows similar behavior, where the optimal sample size for ρ = 0.10 increases to 24 observations for p t ’ = 0.10, as opposed to 18 at trial 2:02 and 6 at trial 2:01. Table 7.02 shows the optimal sample sizes as determined by trials 2:01 through 2:04, and the effect of the prior covariance and experimental error. Decision Rule 1: Control Decision Rule 2: Risk TRIAL 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 MAX 2:01 12 422 14 62 2 60 2:02 52 12 6 4 --- 18 8 4 60 2:03 14 422 20 64 2 60 2:04 56 16 8 4 --- 24 10 6 60 Table 7.02: Optimal Sample Sizes for Trials Using the Exponential Rise and Fall Model. The table illustrates the optimal sample sizes at the four control regions, ρ. The control rule is satisfied when Q exceeds 0.90; the risk rule is satisfied when β falls below five percent for Q 0 = 0.90 and p t ’ = 0.10. As with the previous trials, this work evaluated the optimal sample sizes for trials 2:01 through 2:04 using the estimator precision of simulated trials. Again, each trial employed four test cases that involved different “true” parameter values taken from the prior distribution. For the smaller prior covariance used in trials 159 2:01 and 2:02, these values are [1.08; 0.070], [0.569; 0.128], [0.707; 0.085], and [0.892; 0.137]. For trials 2:03 and 2:04, which employ a larger prior covariance, the evaluation uses parameter values of [0.591; 0.110], [0.644; 0.195], [0.574; 0.300], and [0.738; 0.170] for the four cases. The results of this evaluation for these four trials are illustrated in Figures 7.33 through 7.36. Since this experiment estimated two parameters instead of just one, the results for each decision rule are divided into two panes: one for each parameter. As before, the upper pair of panes corresponds to the first decision rule (expected control), while the lower panes correspond to the second decision rule (risk). As seen with the previous trials, the estimates for each sample size are relatively accurate for each of the test cases at each optimal sample size. One can see that the first parameter is much more difficult to estimate than the second; the credibility intervals of the first parameter tend to adhere very tightly to the precision requirement, while the precision of the second parameter is often much greater than required. Consistent with the previous results, the range of the credibility interval depends on the radius of the control region and SSD decision rule. The cases that correspond to the tails of the prior distribution occasionally exceed the precision requirement using expected control, but never for the risk-based rule. This behavior occurs regardless of the prior distribution or measurement error. The second phase of the evaluation for this experiment involved comparing the optimal sample sizes against the average accuracy and information of the experiment. Figures 7.37 through 7.40 compare the optimal sample sizes from each decision rule to the information and accuracy provided by each sample size, exactly as with the previous experimental model. In each of these trials, the optimal designs tend to follow the pattern in which the designs corresponding to larger control regions (and are less precise) occur early in the design, where the error is already fairly low, while the most precise designs occur at sample sizes in which the error has already decreased to near its final value. In each of trials 2:01 through 2:04, the risk- based design corresponding to ρ = 0.10 and the control-based design corresponding to ρ = 0.05 consistently falls below half and about one third of the maximum error value on the curve. These findings are consistent with the results obtained from the first experimental model in trials 1:01 to 1:04. 160 Figure 7.33: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:01. This figure exhibits the largest credibility intervals encountered from 20 simulations for each case. One can see that the two parameters are not estimated with the same degree of ease; the precision of the first parameter is much smaller than that of the second parameter. However, the credibility intervals follow the same pattern as with the previous model, in which the tail simulations tend to be less precise than those located in the center of the prior distribution. 161 Figure 7.34: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:02. Increasing the variance of the experimental error exhibits a minor affect on the optimal designs using both criteria. However, the credibility intervals are still well within their respective design specifications. As seen in other instances, the designs using expected control slightly exceed their specification when the true parameter value falls in the tail of the prior distribution. The designs using the risk criterion do not. 162 Figure 7.35: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:03. Broadening the prior distribution exhibits little affect over the credibility intervals that the optimal sample sizes produce. Although these designs employ more samples, they efficiently negate the extra prior uncertainty and yield credibility intervals that correspond with the size of their respective control regions. 163 Figure 7.36: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 2:04. As was seen with the trials using the previous experimental model, increasing the prior covariance and experimental error proves quite difficult, and the most precise control regions cannot yield an optimal sample size. However, those designs that are recommended by the algorithm adhere vigorously to their design specification. 164 Figure 7.37: Experimental Accuracy and Information for Trial 2:01. The smaller covariance and error provide accurate estimates at low sample sizes, and the optimal sample sizes all occur relatively early. Figure 7.38: Experimental Accuracy and Information for Trial 2:02. Because the initial accuracy for this experiment is lower than trial 2:01, many of the optimal sample sizes are shifted into the region where the error concaves upward. Only the coarsest regions yield optimal sample sizes in the initial region, where the error is about 10%. 165 Figure 7.39: Experimental Accuracy and Information for Trial 2:03. Increasing the prior covariance has little effect on the accuracy, and the designs behave similarly to their behavior in trial 2:01. Figure 7.40: Experimental Accuracy and Information for Trial 2:04. The accuracy suffers dramatically under the weight of both prior uncertainty and experimental error. The bulk of the designs occur after the elbow region in the accuracy curve, especially those obtained using the risk criterion. 166 7.1.3 Trials 3:01 through 3:04 – Hill Sigmoid The final set of trials to evaluate the proposed SSD method employed a generalized sigmoid model with three parameters. As with the previous trials, the first step entailed computing the EID-optimal designs for each of the four trials at sample sizes ranging from three to ninety observations. Figures 7.41 through 7.44 illustrate the optimal designs for each trial at N = 3, 6, and 12 samples. In general, these optimal designs computed without any difficulty at low size, but often exhibited an extreme computation time (days) for designs exceeding seventy samples. The optimal designs for these trials exhibited the same general behaviors as with the previous experimental models; they exhibited a spread-point design with replications of some design points as the sample size increased beyond twelve samples. Increasing the variance parameter influenced the design points to lower values, whose measurements yielded observations with smaller magnitude. In addition, increasing the prior covariance spreads the design points to regions with the widest range of potential response values for each parameter. The posterior diagnostics for trials 3:01 through 3:04 ran differently than those for the first two experimental models. The MPSRF diagnostic was initially run for a maximum length of fifty thousand samples, but the reduction factor failed to converge. Therefore, this work increased the maximum chain length to 100,000 samples and the re-computed the reduction factors. Figures 7.45 through 7.48 illustrate the results of the posterior diagnostics for trials 3:01 through 3:04. At the extended length, the reduction factor for the Markov chains at each trial converged to one with varying degrees of success. In particular, the diagnostic at a sample size of ninety observations failed to converge to one. However, the reduction factor has stabilized in each of these cases; one can determine by power regression that at the current rate of convergence, more than 250,000 samples would be required to bring the reduction factor to one, which is well beyond what is computationally reasonable. Further examination of these trials indicates that the reduction factors fail to converge to one because the between and within chain variance terms are both extremely small, but their ratio is not one. This is a potential limitation of the MPSRF method; if the m chains that contribute to the reduction factor converge very precisely, but not relative to one another, its value might not converge to one. This concept is explored in more detail in Section 7.3. 167 Figure 7.41: EID-optimal Designs for Trial 3:01. The two clusters of design points in the slope region work in concert to determine the second two parameters. The first parameter is determined by sampling at the highest available input, where the response is perfectly flat for all parameter values. Figure 7.42: EID-optimal Designs for Trial 3:02. Increasing the experimental error forces the design points on the slope toward lower values, and clusters them more tightly to support one another against errors. Unlike previous experimental models in which all of the design points were altered by the error, the design points at high input values remain entirely unaffected. 168 Figure 7.43: EID-optimal Designs for Trial 3:03. Increasing the prior covariance causes the design points along the slope to spread out so that they can cover more ground in response space. Figure 7.44: EID-optimal Designs for Trial 3:04. The clustering behavior of the design points when subjected to increased error is more prevalent, allowing them to mitigate the effect of the error variance by supporting one another and averaging out any experimental errors. 169 Figure 7.45: MPSRF Posterior Diagnostic for Trial 3:01. The Markov process fails to converge to one at 50,000 samples, but by increasing the chain length to 100,000, the process is able to converge to one for each of the sample sizes. Figure 7.46: MPSRF Posterior Diagnostic for Trial 3:02. Even in light of increased experimental error, the reduction factor at 100,000 samples easily achieves convergence to one for all sample sizes. 170 Figure 7.47: MPSRF Posterior Diagnostic for Trial 3:03. Unlike the other three trials, this trial does not converge to one at high sample sizes. This is because the posterior distribution becomes very precise and the between- and within-chain variances become very small, but their ratio does not equal one. However, the reduction factor has stabilized and would require more than twice as many samples to reach a value of one. Figure 7.48: MPSRF Posterior Diagnostic for Trial 3:04. This trial converges much more quickly than the previous one. The variance parameter makes the chain variances manageable and stabilizes their ratio. 171 Figure 7.49: Sample Size Determination for Trial 3:01. Without significant prior knowledge and a relatively silent experiment, estimating three parameters with excellent precision proves itself a daunting task. Even at this early trial, the ρ = 0.05 control region fails to yield an optimal size for either criterion. However, one can obtain a 90% credibility interval within 10% of the estimate relatively easily, and less precise estimates even more so. 172 Figure 7.50: Sample Size Determination for Trial 3:02. The previous trial showed that this experiment is already teetering on a precipice, and the addition of increased experimental error sends it crashing. The two most precise control regions completely fail to converge, and the ρ = 0.15 region shows difficulty. When estimating many parameters, the experimental error must be kept to an absolute minimum to avoid unnecessarily large optimal sample sizes. 173 Figure 7.51: Sample Size Determination for Trial 3:03. The consequences for increasing the prior covariance are not nearly as dire as increasing the error variance, but it is close. In this case, the optimal sample sizes increase because they simply cannot eliminate enough of the prior uncertainty to provide precise estimates. Error affects the optimal sample sizes much more strongly than the prior covariance. 174 Figure 7.52: Sample Size Determination for Trial 3:04. Even with an extremely high maximum sample size, the combination of high prior covariance, experimental error, and number of parameters provides a scenario that is difficult to overcome. Only the two least precise control regions satisfy the precision requirement for expected control, and the only coarsest region yields a risk-based design. This indicates that one cannot consistently achieve high estimator precision with fewer than ninety samples. 175 One can see that reducing the sample size is a much more difficult task with three parameters than with one or two (Figures 7.49 through 7.52). At trial 3:01, which represents the smallest covariance and measurement error, an optimal sample size does not exist for the strictest control region at ρ = 0.05 for either decision criterion. However, the decision criteria for the other three control regions converge relatively quickly. When the variance of the estimation error is increased in trial 3:02, the convergence of the sample size decision criteria takes a substantial hit: only the two largest (and least precise) control regions yield optimal sample sizes under either criteria. Trial 3:03 provides a slightly better result than 3:02; the two largest control regions yield optimal designs at smaller sample sizes than with 3:02 for both decision criteria. Additionally, the control region corresponding to ρ = 0.10 yields an optimal sample size at N = 54 samples for the decision rule based on expected control. One can see that increasing the measurement error inflicts a greater increase on the optimal sample size than increasing the prior covariance, but either one essentially eliminates the researcher’s ability to find a sample size that will provide a very precise estimate. The results of the final trial, 3:04, demonstrate what happens when a researcher encounters the perfect storm conditions of many parameters to estimate, very little prior information, and extraordinarily noisy observations. In this case, the expected control increases very slowly with the sample size as individual measurements are able to provide only a small amount of new knowledge to the paltry amount of prior information already in place. The control for the two smallest control regions increases very slowly and does not increase enough to approach the required threshold at 0.90. The expected control for the two largest control regions (ρ = 0.20 and ρ = 0.15) both eventually attain this threshold at N = 42 and N = 84 samples, respectively. However, only the largest region consistently provides enough precision to satisfy the second decision criterion, reducing the risk to an acceptable level at a sample size of N = 78 samples when the value of p t ’ is set to ten percent. This indicates that an experiment using 78 samples can consistently (more than 90% of the time) provide a 90% credibility interval within 20% of the estimate. When p t ’ is decreased to 0.05, none of the control regions are consistent enough to provide an optimal sample size. Table 7.03 indicates the optimal sample sizes for each trial under each of the decision rules. 176 Decision Rule 1: Control Decision Rule 2: Risk TRIAL 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 MAX 3:01 --- 36 12 6 --- 42 18 9 90 3:02 --- --- 42 15 --- --- 72 24 90 3:03 --- 54 21 9 --- --- 54 21 90 3:04 --- --- 84 42 --- --- --- 78 90 Table 7.03: Optimal Sample Sizes for Trials Using the Hill Sigmoid Model. Due to the number of parameters, prior distribution, and error variance, fewer optimal sample sizes exist for this particular experiment than with other trials. This shows that high precision is improbable using less than 90 samples. As with the trials using the other two experimental models, the optimal sample sizes listed in Table 7.03 were subjected to a two-stage evaluation procedure to examine how the experiments using these sample sizes behave. The first stage examines the effectiveness of the optimal sample sizes by computing the MAP estimates and 90% credibility intervals for four test cases, whose parameter values fall in different regions of the prior distribution. The widest credibility interval of twenty repetitions is illustrated for each case. The first two trials use the parameter vectors [5.94; 1.07; 0.758], [9.12; 1.44; 0.970], [14.0; 1.93; 1.24], [21.5; 2.59; 1.59] for the four test cases. As with the earlier experimental models, the first and fourth case employ parameters from the leading and lagging tail of the prior distribution, respectively, while the middle cases employ parameters taken from the middle of the prior distribution. Because trials 3:03 and 3:04 employ a wider covariance, the parameter values are spread out over the larger distribution: the vectors [7.76; 1.28; 0.872], [9.75; 1.48; 0.992], [12.2; 1.72; 1.22], and [15.4; 2.00; 1.28] represent the four test cases. Again, cases 1 and 4 correspond to outlier parameters, while the second and third cases fall within the central region of the prior distribution. These parameter values are indicated in the figures as horizontal dotted black lines. Figures 7.53 through 7.56 display the results for trials 3:01 through 3:04. As with the previous experimental models, the designs at the optimal sample sizes using both decision rules are represented in these figures, with the three-parameter estimates for each trial illustrated in the three panes. Since neither decision criterion was able to find optimal sample sizes for some of the smaller control regions, only the three 177 largest control regions are represented in trials 3:01 and 3:03, and only the two largest in 3:02 and 3:04. For those control regions that yielded an optimal sample size, the estimates remain relatively accurate in accordance with the results from the previous experimental models. Based on the estimator accuracy and the width of the credibility intervals, the first parameter, which corresponds to the final steady-state value, is the easiest to estimate, while the second parameter, which corresponds to the input that produces a half- maximal response, is the most difficult to estimate. As with the previous trials, the credibility intervals for the optimal sample sizes determined first decision rule typically fall within the boundary designated by the control region with some variation between the test cases, while those from the second decision rule always fall within the boundary with a relatively consistent credibility interval. These results do not provide any new or unexpected results, but illustrate that sample size reduction can be performed successfully on three- parameter experiments without significant loss in efficiency as long as the prior distribution and experimental error are kept adequately in check. Finally, the second evaluation computed curves for the information and accuracy as a function of sample size for each trial. Figures 7.57 to 7.60 illustrate these curves with the optimal sample sizes superimposed. The result of trial 3:01 exhibited a combination of the two behaviors previously exhibited by the optimal sample sizes. Those corresponding to the most precise control regions occur where the error has diminished to a fraction of its original value, while the sample sizes corresponding to larger control regions occur at lower values with reasonable estimation error. The other three trials exhibit the second behavior, which exclusively captures the point of diminishing marginal utility. Since, the estimation error produced by the smallest designs is typically quite large, its lowest value being about seventeen percent in trial 3:01, the other behavior (in which the less precise sample sizes exist while the error is still decreasing) does not exist. Another interesting note is that for these trials, the error for the MLE and MAP estimators are noticeably different, which does not occur with the previous experimental models. The MAP accuracy appears better for low sample sizes, where the prior information can adequately supplement the data obtained from the experiment; however, as the sample size increases, the MLE estimator, which only uses the data and ignores the prior information, provides more accurate estimates. 178 Figure 7.53: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:01. These trials estimate three parameters at each of the optimal sample sizes. The second parameter, corresponding to the input that elicits the half-maximal response, proves to be the hardest to estimate. The large sample sizes deliver the specified precision in this parameter, with more precise estimates for the other two. As always, the risk criterion yields precision that faithfully adheres to the design specification. 179 Figure 7.54: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:02. The increased experimental error yields far fewer optimal sample sizes than in previous experiments. The increased sample sizes that are required support the original design points and minimize the effectiveness of the experimental error. Consequently, the estimates retain their designated precisions regardless of where their parameter vector falls in the prior distribution. 180 Figure 7.55: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:03. Increasing the prior covariance while maintaining low experimental error yields smaller optimal sample sizes than the converse situation seen in the previous trial. Additional design points provide additional information, which erases the prior uncertainty regarding the parameters. Serializing a set of experiments using the posterior of the previous as the prior of the next can help to make this process run more smoothly. 181 Figure 7.56: 90% Credibility Intervals Using Optimal Sample Sizes for Trial 3:04. Even in the worst of experimental conditions, the optimal sample sizes perform exactly as they are designed to do. One can easily see which parameters are the most difficult to estimate, and which are easy. The risk-based designs, which should yield a 90% CI of less than 20% of the mode value, adhere very closely to the design specification along the second parameter, while the first parameter estimate is very precise. 182 Figure 7.57: Experimental Accuracy and Information for Trial 3:01. The estimates become very accurate relatively quickly. Therefore, less precise sample sizes occur early, while ones that are more precise occur later. Figure 7.58: Experimental Accuracy and Information for Trial 3:02. Increasing the experimental error adversely affects the accuracy. Each of the optimal sample sizes occurs after the error has leveled out, adhering to the expected behavior in high-error situations. 183 Figure 7.59: Experimental Accuracy and Information for Trial 3:03. The optimal sample sizes tend to occur after the error has decreased to below 10%. Precise estimates are often accurate ones. Figure 7.60: Experimental Accuracy and Information for Trial 3:04. In this case, small experiments are not particularly accurate. The optimal sample sizes all occur well after the elbow, where the error curve has leveled off. 184 7.2 Results from Practical Experiments The next phase of evaluation of the proposed SSD algorithm involved its practical application to actual experimental systems. The procedure followed the same general plan as with the first phase, within the limits of the individual experiment. First, the optimal designs were computed up to a maximum sample size, which was dictated by the size of the original experiment, N + . Then, the MPSRF posterior predictive diagnostic detailed in Section 4.4 evaluated the proposed chain length for convergence and stationarity. The optimal sample sizes for each of the four elliptical control regions corresponding to ρ = 0.20, 0.15, 0.10, and 0.05 were determined by the risk-based decision criterion, using the results from the control criterion as a backup. Finally, the experiment was re-conducted for each trial by either collecting new data experimentally or extrapolating the original data set to the optimally designed experiments. From this new data, this work estimated the parameters and 95% precision interval using the same estimator as the original investigators. Finally, this work demonstrated the ability of the proposed SSD method to reduce the original experiments without sacrificing the quality of the results by comparing these results to the originally performed, suboptimal experiment. 7.2.1 Sallen-Key Low Pass Filter The first experiment involved the Sallen-key low pass filter. This experiment determined the values of the resistances at R 1 and R 2 by inputting a sinusoidal voltage signal of fixed amplitude or RMS voltage into the circuit and measuring the amplitude or RMS voltage of the output signal. The gain of the filter was computed as the ratio of the input and output signals (either amplitude or RMS) and expressed as a function of the input signal frequency, in Hertz. Three different circuits with different values for R 1 and R 2 were measured twice each: using amplitude and RMS voltages, for six trials in all. The naïve design in this case was a series of 38 measurements between 0.01 and 100 Hz with approximately logarithmic spacing. The optimal designs were computed for sample sizes between two and forty samples in increments of two samples for sample sizes below twenty observations, and increments of four above twenty samples. The 185 computation time of these optimal designs averaged a few hours for designs at the largest sample sizes. The MPSRF posterior diagnostic, shown in Figure 7.61, determined that a maximum Markov chain length of fifty thousand random samples converges to the stationary distribution for experiments of two, ten, twenty, and forty observations. Figure 7.62 shows the sample size determination using both criteria. The algorithm was unable to determine an optimal sample size for either decision rule at the most precise control region, corresponding to a radius parameter of ρ = 0.05. The risk-based decision rule yields optimal sample sizes for the two largest control regions (at N = 14 and 28 samples) at p t ’ = 0.10, while the control-based decision rule recommends experiments at N = 6, 12, and 28 samples. Because there is some overlap between the sample sizes prescribed by the two criteria, this work examined experiments performed at the sample sizes determined by the first criterion, which fully replicates the designs of the other criterion and includes an additional experiment at N = 6 observations. Figure 7.61: MPSRF Posterior Diagnostic for Sallen-Key Low Pass Filter. Since this model has two parameters, a maximum length of fifty thousand samples was employed at this stage. For each sample size, the reduction factor stabilizes; it reaches one for N = 2 and 10, and approaches very near one for the largest sample size. Therefore, this length provides a stationary Markov process for the experiment. 186 Figure 7.62: Sample Size Determination for Sallen-Key Low Pass Filter. In addition to computing the SSD criteria for the optimal designs, the practical experiments also compute SSD scores for the original naïve design at each control region. These scores are indicated by the grey markers at 34 observations. As expected, the optimal designs provide superior precision. The expected control criterion shows that optimal sample sizes exist at 6, 12, and 28 observations (ρ = 0.20, 0.15, and 0.05). 187 Figure 7.63: Comparative Precision between Naïve and Optimal Low Pass Filter Experiments. The MAP estimates and 95% credibility interval for each parameter is illustrated. While the naïve design provides relatively good precision, the ρ = 0.15 (green) design is comparable in most trials, and the ρ = 0.10 (yellow) design displays equal to superior precision across all of the trials. For reference, the point estimate of the naïve design is illustrated as the horizontal dotted line. Once the optimal sample sizes were determined for each of the control regions, this work collected new data for from each circuit using both the amplitude and RMS measurements at each optimal design. Whenever the optimal design calls for replicated measurements, this work employed different observations at different input voltages for the same frequency. This way, one can compute multiple independent evaluations of the circuit gain for a single input frequency. In addition, the input frequencies of the optimal design were rounded to accommodate the spectral resolution limitation of the function generator. The parameters and ninety-five percent credibility intervals were estimated using MAP for all six of the trials using the data from both the reduced and naïve designs. Figure 7.63 diagrams these results. The grey, five- sided star markers indicate the parameter estimates and 95% credibility intervals provided by the thirty-four point naïve design, while the designs corresponding to the optimal sample sizes of N = 6, 12, and 28 188 observations correspond to the blue, green, and yellow markers and error bars, respectively (this matches the colors of the ρ = 0.20, 0.15, and 0.10 regions). Additionally, a horizontal, dashed reference line corresponds to the point estimate of the naïve design at each trial. One can see that in each trial, even the estimates provided by the naïve design contain some uncertainty; their credibility intervals cover about ten to fifteen percent of the magnitude of the point estimate. It is interesting to note that the credibility intervals from the optimal sample sizes are not appreciably larger than this. In fact, the design, corresponding to a sample size of N = 28 observations (yellow) provides roughly equivalent results for both the point estimate and 95% credibility intervals across all six trials with a 20% reduction in sample size from the naïve design. However, one should also note that the smaller, less precise designs (blue and green markers) also provide point estimates that are very similar to the results of the naïve experiment. However, the credibility intervals of these designs are slightly broader. Figure 7.64: Original and Reduced Designs for Sallen-Key Low Pass Filter. The optimal design samples only in regions of high information, and ignores many areas inefficiently employed by the naïve design. This figure illustrates the designs used to provide the results in Figure 7.63 and demonstrates the reduction of sample size possible with a careful and efficient experiment design and the proposed SSD algorithm. 189 Finally, the reduced designs proposed by the SSD procedure are displayed in Figure 7.64, which compares the size of the competing designs and their respective sampling locations against the mean response of the system. In this diagram, the evenly spaced naïve design is indicated by the series of grey, five-sided stars. A series of yellow, green, and blue markers indicate the designs at the optimal sample sizes corresponding to the control regions of ρ = 0.10 (N = 28), 0.15 (N = 12), and 0.20 (N = 6), respectively. One can see from the locations of the optimal samples that only certain regions of the response curve provide information that generates precise estimates. For example, the naïve design uses many design points at low and high frequencies, where the gain does not change very much. Conversely, the optimal design contains a set of samples at a very low frequency, near zero, but completely ignores the region at high frequency near zero gain. The optimal design instead focuses the bulk of design points on the sloping region where the response range is widest, using this information to compute precise estimates of the parameters. This provides a more efficient use of samples than the naïve design, and illustrates the concept of sample size reduction: that one can decrease the sample size dramatically by sampling only in regions of high information. 7.2.2 Fluorescence of Indocyanine Green in Blood The next evaluation recreated the experiment carried out by Maarek et al. (2004) to describe the stability of the fluorescence of indocyanine green (ICG) in aqueous solution, both with and without the addition of sodium polyaspartate (PASP). The objective of the experiment is to estimate the three model parameters, which dictate the shape of the response curve over a given range of ICG concentration in solution. The experiment was carried out by making a series of serial dilutions of ICG in aqueous solution and measuring the fluorescence in millivolts, as the response of a spectrophotometer. This experiment consists of six trials in all, which correspond to measurements of two solutions (a control solution of ICG and another of ICG with PASP added), incubated for 10, 21, and 76 hours. The naïve design consisted of a series of nineteen measurements obtained from serial dilution, which range from zero to 0.1271 molar ICG. Due to the nature of the serial dilution technique, the spacing of the naïve design points is approximately logarithmic. 190 Figure 7.65: MPSRF Posterior Diagnostic for ICG Fluorescence. For the smallest sample size, the reduction factor converges to one very quickly. However, for the larger sample sizes, this value does not approach one at all. However, the reduction factor does stabilize. Closer inspection of the between and within chain variances shows that these values are very small, which skews their ratio to larger values. This additional study verified that the Markov process is stationary at these sample sizes. The EID-optimal designs for this experimental model were computed for sample sizes ranging from three to eighteen samples. Although eighteen samples represent a relatively small design, the purpose of this phase of evaluation is to reduce the sample size of the original experiment, which only employs nineteen observations. In spite of the differences in their model equations, the response of this model closely resembles the response of the exponential rise and fall experimental model used in the previous phase of evaluation. As one might expect, the regions of high information as determined by the optimal design is analogous to that model. The information is highest in the parts of the response that exhibit rapid increase of the response, or in the decreasing part of the curve, especially in the response tail; since the parameter B is simply for scaling of the curve, it does not dramatically influence the location of the optimal design points. Taking a cue from the previous three-parameter experiments, the minimum Markov chain length for this trial was set to one hundred thousand samples. Posterior diagnostics were applied to the designs of three, nine, and eighteen samples to ensure that their Markov chains will converge to a stationary 191 distribution (Figure 7.65). As seen with previous runs of the posterior diagnostic, the speed of convergence proceeded inversely to the sample size of the experiment. In this case, the reduction factor for the three- point design converges to a value of one very quickly, while the other two Markov chains converge much more slowly, and not to one. However, both the N = 9 and 18 reduction factors do taper off to pseudo- convergence. Upon closer inspection, the within-chain and between-chain variance terms that contribute to the reduction factor have converged to very small values whose ratio is somewhat greater than one. Considering that both examples exhibit reduction factors that are barely decreasing at 100,000 samples, this work concluded that these chains have converged and that the MPSRF diagnostic has failed as the result of relatively precise posterior distributions that reject a lot of samples and not from a lack of mixing. Both criteria for sample size determination were computed for the EID-optimal designs at each candidate sample size. Figure 7.66 shows that due to the relatively small maximum sample size for the experiment and relatively uninformative prior distribution, the control and risk criteria do not have much time to converge to the decision thresholds for many of the control regions. However, the risk-based decision rule at p t ’ = 0.10 found optimal sample sizes for the two largest control regions (N = 6 and 9) with relative ease. In addition, the control criterion found an optimal sample size for the control region at ρ = 0.10 at twelve observations. Therefore, this experiment computed parameter estimates and confidence intervals for these three sample sizes and compared them to the results from the original 19-sample experiment. Once the optimal sample sizes were determined for each of the control regions, it was necessary to collect data for the optimal designs at the reduced designs. Since it was not possible to re-conduct the experiment at these new designs, this additional data extrapolates between the data points collected by the naïve experiment. In instances where the optimal design requires samples from outside of the range of the naïve design, new data was simulated from the experimental model using the least-squares estimate of the naïve data set, with five percent noise added. Since the original data for this experiment consists entirely of integer values, these new observations were rounded to the nearest integer to improve the authenticity of the data. 192 Figure 7.66: Sample Size Determination for ICG Fluorescence. The low sample size of the naïve design (19 observations) increases the difficulty of sample size reduction for this experiment. However, the expected control and risk of the naïve design are both inferior to that of the optimal designs. Based on the SSD procedure, this work employs sample sizes of 15, 9, and 6 observations, which correspond to the risk designs for ρ = 0.20 and 0.15, and the expected control design for ρ = 0.10. 193 Figure 7.67: Comparative Precision between Naïve and Optimal ICG Fluorescence Experiments. This figure displays the least-squares estimates and 95% confidence intervals for each parameter. For most trials, the estimates and precision are consistent between the naïve and reduced optimal designs. The exception is the second parameter, which is particularly difficult to estimate. This work estimated the model parameters and 95% confidence intervals at each trial using least-squares estimation from both the naïve data sets and from the new “optimal” data sets; these results are illustrated in Figure 7.67. As with the previous experiments, a grey five-sided star and horizontal dotted line indicate the results of the original Maarek et al. experiment, while a series of colored markers indicates the results of the optimal experiments at the appropriate control region. One can see that each of the designs adequately estimates the B and k 830 parameters for the model. Each trial shows strong consistency of all of the optimal designs with the naïve design. In general, the point estimates are consistent across the trials, but they have confidence intervals of varying sizes. 194 However, the k 775 parameter, which corresponds to the initial increase in response, provided more difficulty in estimation. This is because the increasing portion of the response curve occurs very quickly over a small range of concentration values that the original naïve experiment did not account for. Consequently, the extrapolation of data for the repeated measurements within the optimal design ends up replicating the same observation continuously; in a real experiment, each of these measurements would yield a different result. Also, the response is not particularly sensitive to the k 775 parameter; if one plots the response for each of the parameter estimates at each trial, these curves are very similar to one another (Figure 7.68). This lack of sensitivity makes this parameter very difficult to estimate accurately, but also makes its precise estimate rather unimportant. Since the objective of an experiment is to characterize the response of the model and not simply collect numbers, this degree of estimation error is acceptable. Figure 7.68: Insensitivity of the Model Response to Differences in k 775 Parameter. This illustrates the response for the ICG Fluorescence model using the parameter estimates for the naïve and optimal designs for trial 3, which demonstrates the largest divergence from the results of the naïve experiment. In spite of the differences in their results, the responses using each of these different parameter estimates are almost indistinguishable from one another, proving the equivalence of the naïve and optimal experiments. 195 Figure 7.69: Original and Reduced Designs for ICG Fluorescence Experiment. The optimal design includes samples from the long tail, which are not covered by the naïve design. A naïve design that includes this tail would require many more samples, making the size reduction algorithm more efficient than this design. The precision of the naïve, green, and yellow estimates are equivalent. Figure 7.69 illustrates the reduced designs provided by the SSD procedure. Again, the five-sided stars indicate the original design, while the colored markers designate the different optimal designs. The primary difference between the naïve and optimal designs is in the tail; the original experiment does not sample here at all, in spite of the information it holds. It is also very important to capture the increasing portion of the response curve. The optimal design does this by repeatedly sampling at a few low concentrations to annihilate the effect of the experimental error. Practically, this is easier than evenly spacing the observations along the slope, which requires multiple serial dilutions. Each reduced design provides comparable results to the naïve experiment; the twelve-point design shows almost identical precision to the naïve design. While this might not save a great deal of effort over the original work, an alternative naïve design that incorporates a greater sampling of the response tail might be much larger and the twelve-point design would represent a more impressive reduction over N + . The reduction in this case works, but does not exhibit the true potential of the proposed SSD method. 196 7.2.3 Determination of Anaerobic Threshold The final example used for this phase of evaluation entailed the noninvasive determination of anaerobic threshold, which involved putting a subject on a treadmill or other exercise device and measuring the minute ventilation of the subject at increasing workout intensities. The minute ventilation at each level of intensity was determined by measuring the flow of air as the subject inspires and numerically integrating over the time course of the intensity level to determine the total inspired volume, which one can easily normalize to liters per minute. Since the AT is a secondary parameter in this system, this work must estimate the slope and y-intercept of each line segment in order to compute the intersection of the lines. This experiment was carried out over five trials, each of which corresponds to a different human subject; the group of five subjects all share a similar fitness level whose AT values fall on the interval of 90 to 135. The naïve design for these experiments involved a twenty-sample design that started the subject at a zero- watt workload, and gradually increased the load by fifteen watts at regular time intervals until the intensity reached 300 watts or the subject was exhausted and could not continue. Optimal designs were not computed for this experiment. Instead, the design method detailed in Section 6.3.2 was used to generate “optimal” designs for four to twenty samples, increasing by increments of two samples, for a total of nine candidate designs. This work selected a maximum Markov chain length of fifty thousand samples and computed the MPSRF of AT for the designs at four, ten, and twenty observations. One can see from Figure 7.70 that in each case, the diagnostic converged very near one after fifty thousand samples. The smaller designs converged completely to one, while the largest design converged to about 1.12. This work considered this result as acceptable for a few reasons. First, the value of 1.12 is relatively close to one. Second, the reduction factor has clearly stabilized; its decrease is slow enough that, using a power regression, one can predict that the reduction factor will not reach 1.0 until more than 200,000 samples have been used. Finally, upon closer inspection of the between and within chain variance terms of the reduction factor, it was evident that the lack of absolute converge was due to the usual problem that the magnitudes of the factors are very small at this large sample size, and that errors that are typically invisible manifest themselves at this small scale. This same problem was seen in the previous examples. 197 Figure 7.70: MPSRF Posterior Diagnostic for Anaerobic Threshold. Because of the low maximum sample size, the diagnostic is computed for three sample sizes. As with the other examples, the reduction factor for the Markov process that describes the AT parameter converges to one. Therefore, the chain length of fifty thousand samples provides a stationary description of the posterior predictive distribution. Once the MPSRF diagnostic determined that the prescribed Markov chain length was adequate to produce a stationary representation of the posterior distribution, this work computed the expected control and risk for the SSD procedure (Figure 7.71). This experiment proved to be the hardest of all to reduce using the proposed SSD decision rules, mainly because of the wide prior distribution, simplistic model, small naïve sample size, and somewhat noisy measurements. The expected control curves in the top pane of the figure increase very slowly – almost linearly. The two least precise regions reach threshold at twelve and four samples at Q 0 = 0.90, but the two most precise control regions do not reach, or even approach, the threshold. The risk criterion only finds an optimal sample size for the ρ = 0.20 control region for p t ’ = 0.10, corresponding to a sample size of sixteen observations; the scores for the other control regions failed to converge. For the evaluation portion of this experiment, this work selected the designs whose sample sizes fall at N = 4, 12, and 16 observations, and compared their estimates and confidence intervals to the results of the twenty-point naïve design. 198 Figure 7.71: Sample Size Determination for the Determination of Anaerobic Threshold. Since this experiment does not employ an optimal design, the 21-point naïve design provides better control and risk than smaller designs. The objective of SSD in this case is to reduce the workload on the subject by shaving off a few intensity levels. The expected control shows that optimal sample sizes occur at 12 and 4 observations (ρ = 0.20 and 0.15). 199 Figure 7.72: Comparative Precision between Anaerobic Threshold Experiments. The anaerobic threshold and 95% confidence intervals were computed using least-squares estimation. Due to the large amount of noise in the experimental data, the precision of the estimate is quite large, even for the naïve design. Smaller designs of almost half the size consistently demonstrate equivalent accuracy and precision across all subjects. As with the previous experiment, it was impossible to collect data directly for the optimally sized designs. Even if it were possible to bring the same subjects back to the laboratory to collect more data, it would not be compatible with the previously obtained results, since the AT is an in vivo parameter that fluctuates from day to day with the “physical fitness” of the individual. Therefore, the data for the optimally sized experiments was obtained by linearly extrapolating from the naïve designs. This work computed the point estimates and ninety-five percent confidence intervals for each of the reduced designs and for the naïve design using the least squares estimator to determine the slope and intercept of the line segments on either side of the AT range; the anaerobic threshold was estimated from the intersection of these lines. As illustrated in Figure 7.72, the point estimates for the naïve and reduced designs are approximately equivalent. There is about a five percent difference between the point estimates in every trial. Second, the 200 95% confidence intervals for these estimates are very large, even for the naïve design. This is likely the result of the experimental error, which is prevalent. However, despite the relative lack of precision in the parameter estimates, because of the high magnitude of the AT values, the confidence intervals provided by the reduced experiments do not generally exceed the radius parameter of its corresponding control region. While this feature was discussed in the first phase of evaluation, it is especially useful in this experiment, where the precision is at a premium and even the naïve design has an imprecise credibility interval. Figure 7.73: Original and Reduced Designs for the Determination of Anaerobic Threshold. Unlike the original design that samples from zero to 300 watts of intensity, the “optimal” designs avoid the region near the threshold “elbow”, and the region of high workout intensity. Reducing the sample size can do more than reduce the workload on the subject; each level of intensity can be run for a longer time, which could reduce the noise in the experiment and improve the precision of the AT estimate. Finally, the original and reduced designs for this experiment are compared in Figure 7.73. The primary difference between the naïve design and the “optimal” designs is the efficiency with which the design points are spaced; the new designs do not sample near the elbow region near the anaerobic threshold and minimizes the number of samples at high workload. This both reduces the strain on the subject and segregates the observations into both above and below the threshold. As stated in Section 6.3.2, this 201 strategy eases the load on the estimator by separating the data set into two distinct linear regions, where it is clear which data belongs to which region. Since the confidence intervals provided by the reduced designs are roughly similar to those of the naïve design, this experiment demonstrates a good opportunity to reduce the sample size. Additionally, much of the error involved in this experiment results from erratic breathing patterns of the subject that might be marginalized by extending the time course of each exercise intensity level. By using fewer levels, the experimenter can run each for longer and improve the precision of the AT estimate. Based on the comparable estimates and 95% confidence intervals, and the preferred configurations employed by the new designs, one could easily employ the proposed SSD algorithm to reduce the sample size of this experiment. 7.3 Discussion 7.3.1 Behavior of the Algorithm The evaluation segment of this work demonstrates that the proposed precision-based method for sample size determination and reduction behaves admirably well. It integrates seamlessly into the usual experimental framework, requiring only a modest degree of planning and computer simulation prior to the collection of data. It effectively predicts the credibility interval for the parameter estimates for a given number of observations and shows its usefulness in the laboratory for a practical research application. This section addresses the results of the evaluation and explains how the proposed method of sample size determination satisfies each of the requirements described in Section 6.1. The primary objective of the first stage of the evaluation procedure is to demonstrate that actual experimental result adheres to the design specification: that the credibility intervals of the estimates produced by the proposed algorithm adhere to the size of the control region that generates the optimal sample size, regardless of the actual parameter value. The ability to predict an expected or worst-case scenario for the precision of the parameter estimate of an experiment based on its sample size is an invaluable tool for experiment design, since it allows the researcher to capture the desired degree of precision using the minimum required sample size. Each of the first-stage trials, 1:01 through 3:04, 202 demonstrates that the 90% credibility intervals remain near or within the radii of their respective control regions for both decision rules, even when the “true” parameter vector comes from different regions of the prior distribution. The expected control rule, which provides average control over the prior distribution, adheres to the precision requirement near the center of the prior distribution, but occasionally shows larger credibility intervals when the actual parameter value falls in the tails of the prior distribution. In this case, the expectation balances good results near the center of the prior against bad results in the tails. Alternatively, the SSD rule that marginalizes the risk behaves well across the entire prior distribution and yields credibility intervals that are consistently as or more precise than the specification. In general, the credibility intervals yielded by the sample sizes prescribed by the risk-based decision rule were typically smaller than the size of the designated control region. This is because this particular criterion ensures that virtually all of the Markov chains meet the required precision level and many are overly precise in order to bring more chains that are diffuse under the required level of control. Consequently, the risk-based sample sizes are larger than its control-based counterpart is. However, the first decision rule, which only acts on the expected control of the posterior chains, often performs very closely to the specification of its respective control region. In principle, the expectation operator acts as a smoothing method, in which a small population of high precision can negate another population of low precision. In instances whenever the variance of the control is very high across the K posterior predictive Markov chains, this might inadvertently allow a number of imprecise chains, as long as others exist that can balance out this lack of precision. In the trials performed in this work, this does not appear to happen very frequently. However, one should note that these experiments are affected more strongly by the location of the “true” parameter value’s location within the prior distribution; the credibility intervals provided by the sample sizes from the first decision rule tended to vary between test cases more so than those provided by the second rule. Depending on the degree of precision required by the researcher, the smaller sample sizes dictated by the first decision rule may be preferable to the security provided by the second decision rule and its larger sample sizes. 203 Another requirement of the SSD method is that the optimal sample sizes provide point estimates and credibility intervals show invariance across the parameter and observation space. Since the researcher does not know the exact parameter value of the system of interest prior to experimentation, the SSD algorithm must be able to incorporate prior knowledge and perform well regardless of the actual parameter value. This behavior is evident in the first stage trials of the evaluation, in which the credibility intervals for the MAP estimates vary only slightly for sample sizes determined by the control-based decision rule, and not at all for sample sizes determined by the risk-based decision rule. In general, the precision of an estimate at a control region of a given size correlates to how far the “true” parameter value falls from the center of the prior distribution. For those instances where the parameter of the experiment falls in the tail of the prior distribution, the precision of each control-rule estimate hovers very near the radius of the control region, while the precision improves for parameter values nearer to the center of the prior. A similar pattern exists for the risk-rule estimates, which exhibit much smaller credibility intervals. This behavior occurs because the criteria use a number of different experimental outcomes from different Markov chains that arise from various parameter vectors and observations. The decision rules combine the results of these different outcomes, as either an average or an absolute threshold, and decide whether to accept or reject the proposed sample size based on the entirety of the different experimental outcomes. Chapter 3 discussed the information provided by an experiment and provided a convincing argument for the existence of an optimal sample size for an experiment based on the diminishing marginal utility of the information as the number of observations increases. Since the accuracy and precision of the parameter estimates of an experiment directly relates to the amount of information that the experiment provides, these quantities must also exhibit diminishing returns and taper off as the sample size increases. At the optimal sample size, the error has decreased to a fraction of its original value and additional observations add little to the experimental result. The first stage of evaluation computes curves for information and accuracy and demonstrates the diminishing effectiveness of increasing the sample size of these experiments without bound. By examining how the optimal sample sizes under each decision rule fall on these curves, one notices that many of the proposed designs capture the “turning point” of the marginal utility, where the 204 signal has begun to taper off and stabilize. The precise location of a given optimal sample size on the accuracy or information curve depends on the precision of the particular control region and the estimation error at the smallest sample size. The first behavior seen in the optimal sample sizes follows the original prediction that the optimal sample size will occur at sample sizes where the accuracy curve has diminished to a fraction of its original value and concaves upward. Optimal sample sizes exhibit this behavior for the smallest and most precise control regions and by designs for experiments that yield relatively low accuracy (greater than fifteen percent) at the lowest sample size. In addition to this primary behavior, a second, less expected behavior occasionally occurs for the larger control regions when the smallest sample sizes yield reasonably accurate results. In this instance, it was common to see the optimal sample sizes falling in the steepest part of the error curve. This generally occurred with experiments that estimated one or two parameters, and had a relatively small prior distribution and little measurement error, which caused these experiments to achieve the desired precision very quickly. Since the proposed criterion for sample size determination works on the concept of estimator precision and not explicitly on the curvature of the error curve, this behavior does not oppose the claims of this research. On the contrary, a method that explicitly uses the curvature of the accuracy and information plots would unnecessarily waste experimental resources in these instances where the initial results are already within an acceptable tolerance. This demonstrates that while the proposed SSD method captures the marginal diminishing utility of the experimental information, it is also efficient in its use of experimental resources. Adjusting certain aspects of the Bayesian framework can provide an understanding the behavior of the proposed SSD criterion. Small adjustments in the covariance of the prior distribution and experimental error variance can dramatically alter how the populations of posterior predictive Markov chains are distributed, change the shape of the SSD score curves for both criteria, and increase or decrease the optimal sample sizes. Increasing these aspects of the Bayesian framework influences the sample sizes determined by both criteria, but the control criterion results clearly illustrate how these factors affect the accumulation 205 of experimental precision as one increases the sample size. The first-stage trials demonstrate that increasing either the prior covariance or the measurement error causes an increase in the optimal sample size, with the primary casualties being those optimal sizes that correspond to very precise control regions. For a single parameter, both factors have about the same effect on the sample sizes. However, as the number of parameters increase, one can see that increasing the variance of the measurement error increases the optimal sample size much more than increasing the prior covariance. This is an intuitive result, since a large prior distribution simply requires more information, and thus more observations, to achieve the necessary precision; the control curves show that the expected information obtained from each measurement does not change by increasing the prior covariance. However, an experiment in which the investigator cannot trust the observations provides a pathological scenario in which one must repeat each measurement multiple times to emulate a single, silent observation. A series of noisy observations without proper support might also present negative information about the parameters; the expression in (3.04) refers to the expected information over all possible observations, and does not eliminate the possibility of misleading experiments. Fortunately, it is usually much easier for a researcher to minimize the measurement error by careful experimentation than to reduce the prior distribution by amassing more a priori information. One can see that the combination of high prior covariance, high experimental error, and a large number of parameters condemns the experiment to low precision; this case requires an exorbitant number of samples to achieve even marginal precision. The combination of low information and a limited ability to gain new information ensure that achieving a required degree of precision is a difficult task. It is also critical that the proposed algorithm be applicable to a wide variety of models, including those with nonlinear responses to the input stimuli and parameter densities and models with a wide variety of response shapes. Both stages of evaluation subjected the SSD criterion to a wide variety of experimental models, including those corresponding to a wide range of engineering and scientific applications. While the three general response shapes used here (the decay, rise and fall, and sigmoid) hardly represent an exhaustive list of possible model responses, they do represent a relatively large chunk of model shape types. Many experiments share similar response shapes, even if their model equations are different; some of the models 206 in the second stage of evaluation (Section 6.3.2) evidence this. For example, the model for the Sallen-key low pass filter has the same shape as the Hill sigmoid, despite having a vastly different model equation. The model for ICG fluorescence shares the same shape as the exponential rise and fall model, but its model equation is also very different. This implies that these three models represent an incredible range of experimental models that one is likely to see in the laboratory. The flexibility of this method stems from how it defines and uses the experimental models. While typical methods rely on analytical derivations that employ assumptions of full or piecewise linearity in the model, Markov chain methods work in a purely numerical structure and employ a very high-level abstraction of a model simply as a system that relates a set of inputs to an output, without any other assumptions. This is a key advantage of Markov chain methods; the algorithm evaluates the model for multiple iterations at different parameters and input stimuli and characterizes the relationship between these values and their specific responses, without approximation. By computing the model responses individually and combining them, the proposed method avoids the holistic approaches that bog down analytical and quasi-numerical methods of experimental analysis. The final requirement for the proposed SSD method is that it must work with a reasonably large number of informative parameters. The first trial of evaluation employed models with increasing numbers of parameters, from one to three. Since the coefficients of variation for the prior distributions and variance of the measurement errors were equivalent across these three models, one can assume that the changes in the optimal sample sizes across similar trials (i.e., 1:01 to 2:01, 2:03 to 3:03, et cetera) are primarily due to the differences in number of parameters. Intuition states that a model with more parameters will require a larger sample size than one with only a single parameter; Tables 7.01 through 7.03 confirm this assumption. However, when the experiment employs the prescribed sample size, the number of parameters does not compromise the precision of the parameter estimates; for each experimental model, the largest credibility interval across the parameter space still adheres to the size of the designated control region. The sigmoid and ICG fluorescence experiments demonstrate the ability of the proposed method to determine the optimal sample size for a three-parameter example. In the latter case, the proposed criterion was able to compute optimal sample sizes for the three largest control regions using the expected control decision rule, and the 207 two largest using the risk rule despite the small size of the naïve design. The former case showed that the experimenter must be able to reduce the covariance of the prior distribution and the variance of the measurement error in order to ensure that the optimal sample sizes correspond to reasonably small values. Because the method of computing the expectation in (2.21) does not effectively apply to more than three parameters, this work imposes the three-parameter constraint based only on the limitation of the computation of the optimal designs; this method theoretically supports models that involve any number of parameters. 7.3.2 Sample Size Reduction The practical utility of the proposed method of sample size determination would be severely limited if a researcher could not apply it in an actual laboratory environment or if its results were consistently inferior to the original, full sized experimental designs. To ensure the utility of this criterion, the second phase of evaluation employed four examples taken from actual research and compared the parameter estimates provided by a series of reduced optimal designs with the results from the original, naïve design. Each of these experimental trials involved determining the Bayesian framework (including the prior distribution and variance parameters), computing the optimal designs at different sample sizes to determine candidate designs for the SSD procedure, and computing the MPSRF posterior diagnostics to ensure adequate length of the Markov chain. From the set of candidate designs, this work determined the optimal sample sizes for each of the four control regions used in the first stage of evaluation, which provided varying degrees of estimator precision. The primary difference between the actual experiments employed in the second evaluation stage and the simulated trials from the first stage of evaluation is the computation of a prior distribution and variance parameters. This is a critical step, as too large a covariance inhibits the convergence of the decision criteria, while too small a value will not adequately represent the experiment and disable the algorithm’s ability to compute an accurate parameter estimate. Using the correct variance parameters for the simulation influences the prediction as well; this behavior is particularly evident in the results of the anaerobic 208 threshold exercise, where the data is quite noisy. Since this experiment estimated a single parameter, the sample size determination should have been able to find sample sizes for even the most precise control regions. However, the combination of a large prior covariance and high measurement error variance ensured that only the largest control regions yield optimal sample sizes. This shows that as long as an investigator can compute optimal designs for an experiment and decide upon a prior distribution, the proposed method of sample size determination can be easily applied in practical experimental scenarios. However, if one hopes to achieve actual sample size reduction that can compete with a naïve design, a researcher must take great pains to ensure that the Bayesian framework is manageable, but still accurately describes the a priori experimental knowledge. This may require additional effort, better equipment, et cetera, to reduce the impact of noise and constrain the parameter values to a suitable prior distribution. In addition to being applicable into practical experimental scenarios, any viable method of optimal sample size determination must produce results that compare favorably with the naïve designs already in use. The ability to obtain an equivalent result to a naïve experiment using fewer observations is the essence of sample size reduction; employing a smaller design is not a viable option if the lack of information some diminishes the experimental result. The second stage of the evaluation computed the results for a series of reduced designs and compared those results to those provided by the naïve design. In each of the phase-two trials, the precision and point estimates compare favorably to the original experimental result. In fact, in each trial, at least one of the optimal sample sizes produced a confidence interval that matched the precision of the naïve design. This reveals that the reduced experiments proposed by the SSD algorithm are equivalent to their original, naïvely-designed counterparts. Therefore, the reduced designs computed by the proposed SSD criterion are capable of providing comparable results to a naïve design. Thus far, this work has established that the reduced optimal designs computed by the proposed SSD criteria both adhere to a designated degree of precision and that the reduced design can provide equal precision to a larger, naïvely-designed experiment. The final piece of this puzzle is how one can predict the precision of the naïve design so that a researcher might select the optimal sample size that corresponds to the 209 appropriate control region. This work also computed the expected control and risk for the naïve design and plotted these results alongside those for the optimal designs. It is particularly interesting to note that the naïve designs are far from infallible, and that the optimal designs almost absolutely provide better precision at the same sample size as the naïve design. As with the optimal designs, the SSD scores for the naïve design accurately predicts the size of its credibility interval. Since the confidence and credibility intervals are loosely related, one can compare the predicted results between a naïve and optimal design using the expected control and risk. This provides the researcher with two potential methods of sample size reduction. If the required precision is known, then they can compute the SSD for that value of ρ. Otherwise, the investigator can reduce the sample size by computing the expected precision for the naïve design that one would normally execute and employ the reduced design that will provide equivalent results. Using the proposed SSD algorithm and not simply choosing an arbitrary naïve design eliminates the guesswork involved with such designs and provides the researcher with partial control over the precision of the parameter estimates using fewer observations. The path to better and less costly experiments lies with methods such as this, which allow the researcher to work smarter and more efficiently without sacrificing the quality of the experiment, and not through various draconian cost cutting measures to which many laboratories have resorted in order to remain within their budgets. 7.3.3 Computation Time Sample size reduction in a Bayesian experiment does not come without cost. The improvements to an experiment provided by the proposed SSD method occur transfers the burden of experimentation from the actual researcher in the laboratory to the computer simulation, which is a far less expensive and more scalable and upgradable solution: one can always buy a faster computer, but finding faster researchers proves much more difficult. The primary costs incurred by employing computerized design and simulation are the initial costs of capital and the time required to perform the procedure for sample size determination. Therefore, no discussion of this work would be complete without a discussion that addresses the issue of computation time. 210 The results provided by this work were computed using the proprietary XRedDesign API running on commercially available personal computers running on the Microsoft Windows XP operating system; no special servers or clusters were employed to carry out this procedure, nor was any additional software required. The employed configuration represents the minimum level of computing hardware that one would expect to find in a laboratory setting and easily falls into the budget of any project. The computers employed ranged from the computationally low end to the current state of the art: Pentium 4 desktop (2.6GHz) with 4GB RAM, Pentium M notebook (2.0GHz) with 2GB RAM, two Pentium D desktops (3.0GHz and 3.2GHz) with 2GB RAM, and Pentium Core2 Duo desktop (2.7GHz) with 4GB RAM. Optimization of the program to each processor was achieved using the Intel C++ 9.0 compiler. Each executable file is single-threaded, meaning it will occupy a single processor on the computer; multiple independent instances of the XRedDesign-based program can be run on the same computer if it contains multiple processors. For example, on a two-processor computer, one can compute the SSD scores for two different sample sizes at the same time. This permits the rapid and simultaneous completion of the repetitive computing tasks required by the proposed SSD algorithm. The limited computing resources employed by this work demonstrate that any laboratory can implement the proposed SSD criterion and determine the optimal sample size for a given experiment in a reasonable amount of time. The bulk of computation time was occupied by the computation of the optimal design and the posterior predictive Markov chain diagnostic. The EID-optimal designs computed in this work consumed the second largest amount of computational time; the length of time required to obtain an optimal design was correlated to both the sample size of the experiment and the number of parameters. This occurs primarily because the number of parameters determines the number of nodes in the parameter mesh, which is used to compute the expected information at each design point. For a single parameter, where the one can represent the single-dimensional prior distribution in a relative few (1000) nodes, the optimization proceeds very quickly, maximizing at about two hours for the largest design. However, the three-parameter Hill sigmoid optimal designs require 15625 nodes (which only provides 25 samples per dimension) and can take significantly longer, requiring several days to compute the largest optimal designs. To ensure the 211 convergence of the EID-optimal design to its global maximum, each design was run through the optimizer algorithm twice, which increased the computation time somewhat, although the design on the second pass converged much faster than the first. The complexity of the exponential model also has some effect on the computation time for the EID-optimal design. Most of the models used in this work incorporate a simple framework that directly relates the input to the output and evaluates very quickly. However, as discussed previously, the proposed SSD method employs a very high-level abstraction of a model that can be as complicated as the numerical solution to a system of differential equations or linear equations or the root of a polynomial. For illustration purposes, this work computed the optimal designs for sample sizes up to thirty times the number of parameters. Fortunately, one need not compute all of the EID-optimal designs for a given experiment in order to determine the optimal sample size. In practice, the researcher should compute the optimal designs and precision for the sample size decision in alternating steps and stop the procedure when the algorithm determines the optimal sample size for the most precise control region (or the maximum desired precision). For example, trial 2:01 could have been stopped at 14 or 18 samples, depending on the desired value of p t ’. The designs for the remaining sample sizes are not necessary to compute, which can save a great deal of time. The MPSRF posterior predictive diagnostic described in Section 4.4 required the longest computation time. This is due to the sheer computational volume required to compute the reduction factor of each Markov chain. For each chain generated from a singular observation vector, z, (an in the case with Bayesian inference) the MPSRF requires m parallel chains started from different points at the edge of the prior distribution, which was set to the larger of 2 P or five chains. These chains generate a single series of reduction factors. The additional step added by employing the preposterior distribution expands the computation time by a factor of K = 5000. Therefore, each evaluation of the MPSRF diagnostic for the design scenario requires the generation and analysis of mK individual Markov chains. Depending on the number of parameters, length of the Markov chain, and sample size, which all increase the computation time, these diagnostics can take up to a week each to compute. This was the motivating factor for employing an offline method for computing the MPSRF rather than computing it for each individual 212 Markov chain used in the SSD algorithm; if the diagnostic were computed for every one of these chains, the computation time would be prohibitively long. Perhaps ironically, the computation of the optimal sample size requires the least amount of time and resources of all of the computational stages. The bulk of this method involves the computation of K = 10,000 to 25,000 posterior predictive Markov chains; computing each chain proceeds very quickly regardless of the number of parameters. The limiting factor at this step is the complexity of the model functions, which affects the speed at which the computer can compute the likelihood ratio of a given pair of parameter vectors. Once the program generates each chain, its analysis is a simple task of cycling through the links and determining whether each falls within the ellipsoidal control region. This process can be done at each link for multiple control regions with little additional computation. Some variability exists between models and number of parameters, but each of the K Markov chains consumes about one second of computation time. Therefore, the longest SSD computations require fewer than eight hours per sample size. The post-chain part of the SSD algorithm proceeds very quickly; the expectation reduces to a simple average and the one-dimensional integral required to compute the risk can be numerically determined very quickly by using the trapezoid method for a relatively dense (1000 samples) grid. After this discussion of all of the computation time required by the proposed SSD method, one might infer that it is prohibitively expensive to employ in a laboratory setting. However, this is definitely far from the truth. The aforementioned computation times refer to single-threaded processes running on a single processor, similar to the concept of “man-hours” to describe an amount of physical labor. As discussed above, since the XRedDesign software computes each stage of SSD independently of the others, one can easily separate the tasks of computing the EID-optimal designs, MPSRF diagnostics, and SSD decision criteria into separate processes that run independently on their own processors, providing a linear speedup based on the number of processors. One can even choose to run computation for every other sample size on parallel processors. For example, on a dual-core machine, a researcher could compute the odd sample sizes on one processor and the even sample sizes on the other, cutting actual time in half. A quad- or eight-core 213 machine could further decrease the required computation time without incurring the penalty usually attributed to multithreaded processes. Considering the prominence of multi-core personal computers, one can shorten the actual time required to compute the optimal design and SSD score for an experiment. This provides a superior method to perform sample size reduction of an experiment over traditional parallelization techniques and ensures that one can perform SSD for many experiments in a reasonably short time. As computers continue to improve in speed and computational power, the time to compute the optimal sample size will reduce even further. To further aid in automation, the XRedDesign framework supports batch scripting to run multiple trials without user intervention; the researcher must simply write a script file that contains the necessary parameters for the design procedure and the computer will iteratively progress through each computation and terminate when all have been completed. This allows the SSD procedure to be run overnight, on weekends and holidays, or any other “down time” that might not otherwise be available for research by a human being. Consequently, much of the time required to generate optimal sample sizes for an experiment can occur invisibly, when the human researchers are unable to work. One can see that in spite of the computation time required to carry out the proposed SSD algorithm, this work manages each of the pitfalls of computer simulation (cost of computers, awkward parallelization, computing time, et cetera) and provides a cohesive and easily implementable solution that will fit into any laboratory environment. 214 Chapter 8 Outlook 8.1 Conclusions This work proposes a novel algorithm for sample size determination that uses posterior predictive Markov chain simulation to determine the sample sizes that provide a given degree of estimator precision. The proposed pair of SSD decision rules combines previous work in optimal experiment design and Bayesian experimentation with statistical concepts used in quality control and acceptance sampling. Each rule evaluates potential experiments and determines when the estimator precision has increased to a predetermined threshold. This works by examining a series of experiment designs of increasing sample sizes and determining the expected precision and risk of failure for each design. This work defines the optimal sample size as the experimental design with the fewest observations that manages to satisfy the required decision rule. By generating a population of parallel Markov chains from the preposterior distribution, the algorithm can produce a series of marginal posterior distributions that represent the realm of potential experimental outcomes. This ensures that the design methodology is robust to the actual set of observations seen in the experiment. This work shows that the proposed method of sample size determination can compute optimal sample sizes that dependably yield a researcher-specified degree of estimator precision, allowing one to reliably reduce the number of observations required to compute precise parameter estimates. To study the behavior of the proposed algorithm and demonstrate its suitability for use in the laboratory environment, a two-stage evaluation was performed that employed six unique experimental models using different prior distributions and experimental error. This work showed that the optimal sample size for a given experiment is adversely affected by increasing either the prior covariance or the variance of the measurement error. In addition, it was shown that high measurement error by itself increases the minimum sample size more dramatically than a high prior covariance by itself; the combination of high measurement 215 error and high prior covariance proved devastating to the SSD process. Therefore, it is very important for a researcher to accurately describe the prior distribution and measurement error when using this SSD algorithm in practice. The proposed algorithm for sample size determination was applied to three real- world experimental problems, and the point estimates and estimator precision using the optimal sample sizes was shown to be comparable to larger, naïvely-designed experiments. This proves that one can use a combination of optimal design and SSD to reduce the sample size of a Bayesian experiment. Finally, to implement the massive computations required to implement these decision criteria, an open- source software package called XRedDesign was developed. This program presents the experimenter with a flexible application programming interface within which an investigator can easily compute optimal designs and optimal sample sizes for any Bayesian experimental framework, and even define new experimental models, prior distribution types, and optimal design criteria. This allows one to apply the software to any experimental scenario. This work demonstrated that one could use this software on a variety of inexpensive, off-the-shelf personal computers to streamline the design of experiments and perform sample size determination under limited computation time. 8.2 Advantages of the Method The evaluation phase of this work demonstrates some key advantages of the proposed SSD method and illustrates many instances in which the use of an optimal sample size can dramatically improve the efficiency of the experimentation without sacrificing precision in the results. Of particular importance is the fact that this algorithm does not hinge on considerations of cost or utility, which alleviates the need for the researcher to place a price on a given result. This sidesteps many of the moral dilemmas that may occur in biomedical engineering and medicine when one must assign a cost to a decision that may result in injury or loss of life. Rather, this method of sample size determination predicts the expected precision of the experimental result and bases the optimal sample size on the number of observations that yield (on average or at minimum) a given threshold of estimator precision. The evaluation stage of this work demonstrates that the credibility intervals from the parameter estimates provided by the experiments at the optimal 216 sample sizes reliably correspond to their respective precision requirements. By focusing on the results of the experiment and not its costs, a researcher can streamline an experiment to a specific degree of precision without compromising its objective. This work subjected the proposed SSD method to a number of different experimental models of different shapes and number of parameters. This included both models that solve differential equations and those commonly in use in practical laboratory experiments. In each case, the method was able to determine optimal sample sizes that correspond to a reasonable number of observations and provide the correct estimator precision. One can apply this SSD method to all types of experimental models, including nonlinear systems and models that incorporate multiple mathematical operators and equations. This is because Markov chain methods employ a purely numerical technique that evaluates an experimental model repeatedly for single values of input and parameters and generates a population of response values, which are used to make inferences about the system. Therefore, a researcher can employ literally any experimental model with the proposed SSD method. Finally, this work shows that a researcher can confidently reduce the sample size of a practical laboratory experiment using the proposed SSD method. In each of the practical trials, this work showed that the optimal sample size provided a set of parameter estimates whose precision rivaled that of the original (and larger) naïve design. The point estimates provided by each reduced experiment also correspond very closely with those of the original design. In addition, the sample sizes provided by the proposed algorithm allow the researcher to specify a required degree of estimator precision, rather than simply guessing, as with the naïve experimental design. The practical application of the proposed SSD method to a realistic research scenario is its ultimate success and opens the door to its widespread use in the laboratory setting. This should reduce the various time and monetary costs associated with experimentation and free up these resources for further experimentation. 217 8.3 Limitations of the Method While the implementation of the proposed algorithm for sample size determination provided good results, it illustrated some potential pitfalls that the wise researcher must endeavor to avoid. As with all Bayesian methods, the investigator must correctly identify the Bayesian framework of the experiment in order to correctly and adequately describe the experiment. Since the values in a posterior Markov chain are drawn from the prior distribution, one must select the prior distribution type, mean, and covariance so that the actual parameter value falls within it, preferably toward the center of the distribution. In particular, one must take extreme care to ensure that as the number of parameter increases, the optimal sample sizes prescribed by the algorithm remain at a reasonable level. This requires careful consideration of the prior information of the experiment in order to ensure that one does not overestimate the covariance of the prior distribution and handicap the SSD computation. In addition, the researcher must systematically eliminate the sources of experimental error as completely as possible, since this provides the greatest hindrance to the reduction of the experimental sample size. Likewise, since one simulates the observations that provide the seed for each Markov chain, the preposterior distribution that generates these observations must envelop every possible observation that might occur in the experiment. This includes proper selection of the variance model and parameters; the researcher must strive to err on the side of caution by selecting a variance parameter that slightly overestimates the experimental error. This ensures that the true variance is represented in the preposterior distribution without hindering the convergence of the estimator precision to its required threshold. As long as the researcher properly defines the prior and preposterior distributions, the SSD results that the algorithm provides will lead to experiments that yield precise results. As long as the researcher adequately characterizes the Bayesian framework for the experiment, the proposed SSD algorithm displays few other limitations that would prevent its widespread use in the laboratory environment. 218 8.4 Future Work and Directions of Research Although none of the experimental models used in this work have more than three informative parameters, one can apply the proposed method to experiments with any number of informative parameters. The primary limitation of this work is the computation of optimal designs for large P, because the high- dimension expectation becomes extremely difficult to compute. The Monte Carlo approach fails to remain stationary for more than a single parameter and the discrete mesh approach used in this work does not effectively support more than three parameters; alternative methods of computing the expectation over the prior distribution would release this limitation and allow the computation of higher dimensional designs. This will allow the application of the proposed algorithm to a much wider range of experimental problems in biomedical engineering and medicine, many of which have five or more parameters of interest. The natural progression of computing technology and the advent of new computers will provide a degree of passive improvement of the implementation of this work in the laboratory. As computer speed increases, it becomes possible to increase the number of parallel Markov chains (K) in the sample size determination. Since this work intends each Markov chain to represent a possible experimental outcome, using more chains covers more possibilities, and provides for a more accurate preposterior description of the experiment. Even for the current values of K, improvements in computer technology will eventually trivialize the computation times experienced in this work. Increases in computational power will also allow the implementation of more complex models, such as those with open form model equations that require a root-finding algorithm, or even experiments whose model outputs are described by a system of analytically intractable differential equations. While the current framework permits these types of models, their computational complexity requires excessive computation time. Along the lines of improvements in computing technology, the XRedDesign program can be modified to take advantage of parallel processing on computers using multiple processors. This can be carried out with relative ease using a method such as OpenMP (Itzkowitz et al., 2007), or by more involved and more powerful optimizations. A multiple processor optimization is not critical on computers with two to four 219 cores, as it is just as easy to run multiple simultaneous instances of the XRedDesign program on the same machine with different sample sizes. However, this will become more important as the number of processors in personal computers increases beyond the point at which the single-run-per-processor model is practical. Finally, the method of generating multiple Markov chains in parallel that reflect the marginal posterior distribution could be used as the basis of a new optimal design criterion; rather than using preposterior chains to simply determine the optimal sample size, they might also be employed to determine the optimal sample locations within the design space. Since (3.03) describes the information generated by an experiment in terms of the expected distance between the prior and posterior distributions over z, one can derive an information criterion using parallel Markov chains. The major challenge to this would be employing enough Markov chains to represent a marginal posterior distribution over z that is stationary enough to optimize. This would require massive computing power, ultimately resulting in computing a criterion that is similar in complexity to the proposed SSD criterion, only evaluated thousands of times to find the global optimum. It may be some time before computers exist that can effectively carry out these computations. 220 References ACHAR, JORGE ALBERTO. (1984). Use of Bayesian Analysis to Design of Clinical Trials with One Treatment. Communication in Statistics: Theory and Methods, 13(14): 1693-1707. ADCOCK, C.J. (1997A). Sample Size Determination: A Review. The Statistician, 46(2): 261-283. ADCOCK, C.J. (1997B). The Choice of Sample Size and the Method of Maximum Expected Utility - Comments on the Paper by Lindley. The Statistician, 46(2): 155-162. ATKINSON, A. C. AND DONEV, A. N. (1992). Optimum Experimental Designs. Clarendon Press, Oxford, England. BARD, YONATHAN. (1974). Nonlinear Parameter Estimation. Academic Press, New York, NY. BEKEY, G.A.; GRAN, M.H.; SABROFF, A.E.; AND WONG, A. (1966). Parameter Estimation by Random Search Using Hybrid Computer Techniques. Proceedings of the 1966 Fall Joint Computer Conference, 191-200. BEKEY, GEORGE A. AND UNG, MAN T. (1974). A Comparative Evaluation of Two Global Search Algorithms. IEEE Transactions on Systems, Man, and Cybernetics, 4(1): 112-116. BEKEY, GEORGE A. AND YAMASHIRO, STANLEY M. (1976). Parameter Estimation in Mathematical Models of Biological Systems. Advances in Biomedical Engineering, 6: 1-43. BEKEY, GEORGE A. AND MASRI, SAMI F. (1983). Random Search Techniques for Optimization of Nonlinear Systems with Many Parameters. Mathematics and Computers in Simulation, 25(3): 210-213. BELLMAN, RICHARD; KALABA, ROBERT; AND MIDDLETON, DAVID. (1961). Dynamic Programming, Sequential Estimation and Sequential Detection Processes. Proceedings of the National Academy of Sciences of the United States of America, 47(3): 338-341. BERGER, JAMES O. (1980). Statistical Decision Theory: Foundations, Concepts, and Methods. Springer-Verlag, New York, NY. BERNE, ROBERT M. AND LEVY, MATTHEW N. (1997). Cardiovascular Physiology, 7 th ed. Mosby-Year Book, Inc., St. Louis, MO. pp78-79. 221 BEZEAU, MARY AND ENDREYNI, LAZLO. (1986). Design of Experiments for the Precise Estimation of Dose-Response Parameters: The Hill Equation. Journal of Theoretical Biology, 123(4): 415-430. BLACKWELL, DAVID. (1953). Equivalent Comparison of Experiments. Annals of Mathematical Statistics, 24(2): 265-272. BOX, G.E.P. AND MULLER, MERVIN E. (1958). A Note on the Generation of Random Normal Deviates. The Annals of Mathematical Statistics, 29(2): 610-611. BOX, G.E.P. AND LUCAS H.L. (1959). Design of Experiments in Nonlinear Situations. Biometrika, 46(1-2): 77-90. BOX, GEORGE E. P. AND HILL, WILLIAM J. (1974). Correcting Inhomogeneity of Variance with Power Transformation Weighting. Technometrics, 16(3): 385-389. BOX, GEORGE; JENKINS, GWYLIM M.; AND REINSEL, GREGORY C. (1994). Time Series Analysis: Forecasting and Control, 3 rd ed. Prentice Hall, Englewood Cliffs, NJ. BROOKS, R.J. (1987). On the Design of Comparative Lifetime Studies. Communication in Statistics: Theory and Methods, 16(5): 1221-1240. BROOKS, S. AND GELMAN, A. (1998). General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics, 7: 434-455. BROOKS, STEPHEN P. AND ROBERTS, GARETH O. (1998). Convergence Assessment Techniques for Markov Chain Monte Carlo. Statistics and Computing, 8: 319-335. CAREY, DANIEL G.; SCHWARZ, LESLIE A.; PLIEGO, GERMAN J.; AND RAYMOND, ROBERT L. (2005). Respiratory Rate is a Valid and Reliable Marker for the Anaerobic Threshold. Journal of Sports Science and Medicine, 4: 482-488. CHALONER, KATHRYN AND LARNTZ, KINLEY. (1989). Optimal Bayesian Design Applied to Logistic Regression Experiments Journal of Statistical Planning and Inference, 21(2): 191-208. CHALONER, KATHRYN AND VERDINELLI, ISABELLA. (1995). Bayesian Experimental Design: A Review. Statistical Science, 10(3): 273-304. CRESSIE, N.A.C. AND KEIGHTLEY, D.D. (1981). Analyzing Data from Hormone-Receptor Assays. Biometrics, 37(2):235-249. 222 DASGUTPTA, ANIRBAN AND MUKHOPADHYAY, SAURABH. (1994). Uniform and Subuniform Posterior Robustness: the Sample Size Problem (with discussion). Journal of Statistical Planning and Inference, 40(2-3): 189-204. DESANTIS, FULVIO; PACIFICO, MARCO PERONE; AND SAMBUCINI, VALERIA. (2004). Optimal Predictive Sample Size for Case-Control Studies. Applied Statistics, 53(3): 427-441. DUNCAN, ACHESON J. (1959). Quality Control and Industrial Statistics. Richard D. Irwin, Inc., Homewood, IL. FISHER, R.A. (1950). Contributions to Mathematical Statistics, Papers 10, 11, and 38. John Wiley and Sons, West Sussex, England. FRASER, D.A.S AND GUTTMANN, IRWIN (1956). Tolerance Regions. The Annals of Mathematical Statistics, 27(1): 162-179. FUKUNAGA, K. (1972). Introduction to Statistical Pattern Recognition. Academic Press, New York, NY. GAMERMAN, DANI. (1997). Markov Chain Monte Carlo. Chapman and Hall, London, England. GELFAND, SAUL B. AND MITTER, SANJOY K. (1991). Recursive Stochastic Algorithms for Global Optimization in R d . SIAM Journal of Control and Optimization, 29(5): 991-1018. GELMAN, ANDREW AND RUBIN, DONALD B. (1972). Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4): 457-472. GEYER, C.J. (1992). Practical Markov Chain Monte Carlo (with discussion). Statistical Science, 7(4): 473-511. GRIMMETT, G.R. AND STIRZAKER, D.R. (1992). Probability and Random Processes, Second Edition. Oxford University Press, New York, NY. HASTINGS, W.K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their Applications. Biometrika, 57(1): 97-109. HILL, A.V. (1910). The Possible Effects of the Aggregation of the Molecules of Haemoglobin on its Dissociation Curves. Journal of Physiology, 40: iv-vii. 223 HOARE, C.A.R. (1961). Partition: Algorithm 63, Quicksort: Algorithm 64, and Find: Algorithm 65. Communications of the ACM. 4(7): 321-322. HOEL, P.G. (1954). Introduction to Mathematical Statistics. John Wiley and Sons, West Sussex, England. HOLFORD, N.H.G. AND SHEINER, L.B. (1981). Understanding the Dose-Effect Relationship: Clinical Application of Pharmacokinetic-Pharmacodynamic Models. Clinical Pharmacokinetics, 6(6): 429-453. HUANG, T.C. (1967). Engineering Mechanics, Volume II: Dynamics. Addison-Wesley Publishing Co., Reading, MA. ITZKOWITZ, MARTY; MAZUROV, OLEG; COPTY, NAWAL; AND LIN, YUAN. (2007). An OpenMP Runtime API for Profiling. Sun Microsystems White Paper. JOHNSON, RICHARD A. (1994). Miller and Freund’s Probability and Statistics for Engineers, Fifth Edition. Prentice Hall, Englewood Cliffs, NJ. JOSEPH, LAWRENCE; WOLFSON, DAVID B.; DU BERGER, ROXANE. (1995). Sample Size Calculations for Binomial Proportions via Highest Posterior Density Intervals. The Statistician, 44(2): 143-154. KANG, DONGWOO. (2000). Bayesian Inference Using Markov Chain Monte Carlo Methods in Pharmacokinetic/Pharmacodynamic Systems Analysis. Ph.D. Dissertation, Los Angeles, CA: University of Southern California. KASS, ROBERT E.; CARLIN, BRADLEY P.; GELMAN, ANDREW; AND RADFORD, NEAL M. (1998). Markov Chain Monte Carlo in Practice: A Roundtable Discussion. The American Statistician, 52(2): 93-100. KHINKIS, LEONID A.; LEVASSEUR, LAURENCE; FAESSEL HELENE; AND GRECO, WILLIAM R. (2003). Optimal Design for Estimating Parameters of the 4-Parameter Hill Model. Nonlinearity in Biology, Toxicology, and Medicine, 1(3): 363-377. KHOSLA, DEEPAK (2007). Information Systems and Sciences Laboratory, HRL Laboratories, L.L.C. Personal Correspondence. KREYSZIG, ERWIN. (1999). Advanced Engineering Mathematics, Eighth Edition. John Wiley and Sons, New York, NY, 291-292. KULLBACK, S. AND LEIBLER, R. A. (1951). On Information and Sufficiency. Annals of Mathematical Statistics, 22(1): 79-86. 224 LAGARIAS, JEFFREY C.; REEDS, JAMES A.; WRIGHT, MARGARET H.; AND WRIGHT, PAUL E. (1998). Convergence Properties of the Nelder-Mead Simplex Method in Low Dimensions. SIAM Journal on Optimization, 9(1): 112-147. LATHI, B. P. (1992). Linear Systems and Signals. Berkeley-Cambridge Press, Carmichael, CA. LIBBY, W.F. (1955). Radiocarbon Dating, Second Edition. University of Chicago Press, Chicago, IL. LINDLEY, D. V. (1956). On a Measure of the Information Provided by an Experiment. Annals of Mathematical Statistics, 27(4): 986-1005. LINDLEY, D.V. AND BARNETT, B.N. (1965). Sequential Sampling: Two Decision Problems with Linear Losses for Binomial and Normal Random Variables. Biometrika, 52(3-4): 507-532. LINDLEY, DENNIS V. (1997A). The Choice of Sample Size. The Statistician, 46(2): 129-138. LINDLEY, DENNIS V. (1997B). The Choice of Sample Size – A Reply to the Discussion. The Statistician, 46(2): 163-166. MAAREK, JEAN-MICHEL I.; HOLSCHNEIDER, DANIEL P.; HARIMOTO, JUJI. (2001). Fluorescence of Indocyanine Green in Blood: Intensity Dependence on Concentration and Stabilization with Sodium Polyaspartate. Journal of Photochemistry and Photobiology B: Biology, 65: 157-164. MAAREK, JEAN-MICHEL I.; HOLSCHNEIDER, DANIEL P.; HARIMOTO, JUJI; YANG, JUN; SCREMIN, OSCAR U.; AND RUBENSTEIN, EDUARDO H. (2004). Measurement of Cardiac Output with Indocyanine Green Transcutaneous Fluorescence Dilution Technique. Anesthesiology, 100: 1476-83. MARIK, PAUL E. (1999). Pulmonary Artery Catheterization and Esophageal Doppler Monitoring in the I.C.U.. Chest, 116: 1085-1091. MASRI, SAMI F.; BEKEY, GEORGE A.; AND SAFFORD, F.B. (1980). A Global Optimization Algorithm Using Adaptive Random Search. Applied Mathematics and Computation, 7(4): 353-375. MATH WORKS, MATLAB. (1994). The Math Works, Natick, MA. METROPOLIS, NICHOLAS; ROSENBLUTH, ARIANNA W.; ROSENBLUTH, M.N.; AND TELLER A.H. (1953). Equation of State by Fast Computing Machines. Journal of Chemical Physics, 21(6): 1087-1092. 225 MILLER, KENNETH S. (1975). Multivariate Distributions. Robert E. Krieger Publishing Company, Huntington, NY. MÜLLER, PETER; PARMIGIANI, GIOVANNI; ROBERT, CHRISTIAN; AND ROUSSEAU, JUDITH. (2004A). Optimal Sample Size for Multiple Testing: the Case of Gene Expression Microarrays. Journal of the American Statistical Association, 99(468): 990-1001. MÜLLER, PETER; SANSÓ, BRUNO; AND DEIORIO, MARIA. (2004B). Optimal Bayesian Design by Inhomogeneous Markov Chain Simulation. Journal of the American Statistical Association, 99(467): 788-798. NELDER, J.A. AND MEAD, R. (1965). A Simplex Method for Function Minimization. Computer Journal, 7(4): 308-313. NORMAND, SHARON-LISE T. AND ZOU, KELLY H. (2002). Sample Size Considerations in Observational Health Care Quality Studies. Statistics in Medicine, 21(3): 331-345. PEZESHK, H. (2003). Bayesian Techniques for Sample Size Determination in Clinical Trials: A Short Review. Statistical Methods in Medical Research, 12(6): 489-504. PHAM-GIA, T. (1997). On Bayesian Analysis, Bayesian Decision Theory and the Sample Size Problem. The Statistician, 46(2): 139-144. PINSKY, MICHAEL R. (2002). Functional Hemodynamic Monitoring. Intensive Care Medicine, 28: 1229-1232. PRESS, WILLIAM H.; TEUKOLSKY, SAUL A.; VETTERLING, WILLIAM T.; AND FLANNERY, BRIAN P. (1997). Numerical Recipes in C, 2 nd ed. Cambridge University Press, New York, NY. PRICE, NICHOLAS C. AND STEVENS, LEWIS. (1989). Fundamentals of Enzymology, Second Edition. Oxford University Press, Oxford, England. PRONZATO, LUC AND WALTER, ERIC. (1985). Robust Experiment Design via Stochastic Approximation. Mathematical Biosciences, 75(1): 103-120. PRONZATO, LUC AND WALTER, ERIC. (1993). Experimental Design for Estimating the Optimal Point in a Response Surface. Acta Applicandae Mathematicae, 33(1): 45-68. QUINTANA, FERNANDO A. AND MÜLLER, PETER. (2004). Optimal Sampling for Repeated Binary Measurements. Canadian Journal of Statistics – Revue Canadienne de Statistique, 32(1): 74-84. 226 RAIFFA, HOWARD. AND SCHLAIFER, ROBERT. (1961). Applied Statistical Decision Theory. Harvard University Graduate School of Business Administration, Boston, MA. ROBERTS, GARETH O. (1996). Markov Chain Concepts Related to Sampling Algorithms. Markov Chain Monte Carlo in Practice, Gilks, W.R.; Richardson, S.; and Spiegelhalter, D.J. eds. Chapman and Hall/CRC Press, Boca Raton, FL: 45-57. SÄRKKÄ, SIMO AND VEHTARI, AKI. (2004). MCMC Diagnostics Toolbox for MATLAB version 6.X. Laboratory of Computational Engineering, Helsinki University of Technology, Finland. SEBER, G.A.F. AND WILD, C.J. (1989). Nonlinear Regression. John Wiley and Sons, New York, NY. SHANNON, C. E. (1948). A Mathematical Theory of Communication. The Bell System Technical Journal, 27(3): 379-423, 623-656. SILVERMAN, B.W. (1992). Density Estimation for Statistics and Data Analysis. Chapman and Hall/CRC Press, Boca Raton, FL. SONG, DALE AND WONG, WENG KEE. (1998). Optimal Two-point Designs for the Michaelis-Menten Model with Heteroscedastic Errors. Communications in Statistics: Theory and Methods, 27(6): 1503-1516. SYLVESTER, RICHARD J. (1988). A Bayesian Approach to the Design of Phase II Clinical Trials. Biometrics, 44(3): 823-836. TIERNEY, L. (1994). Markov Chains for Exploring Posterior Distributions (with discussion). Annals of Statistics, 22(4): 1701-1762. THOMAS, ROLAND AND ROSA, ALBERT J. (2003). The Analysis and Design of Linear Circuits, Fourth Edition. John Wiley and Sons, New York, NY. VANDERBILT, DAVID AND LOUIE, STEVEN. (1984). A Monte Carlo Simulated Annealing Approach to Optimization Over Continuous Variables. Journal of Computational Physics, 56(2): 259-271. WALD, ABRAHAM. (1950). Statistical Decision Functions. John Wiley and Sons, New York, NY. WALTER, E. AND PRONZATO, L. (1987). Optimal Experiment Design for Nonlinear Models Subject to Large Prior Uncertainties. American Journal of Physiology, 253(3): R530-R534. 227 WANG, FEI AND GELFAND, ALAN E. (2002). A Simulation-based Approach to Bayesian Sample Size Determination for Performance under a Given Model and for Separating Models. Statistical Science, 17(2): 193-208. WEST, JOHN B. (1990). Respiratory Physiology. Williams and Wilkins Publishing, Baltimore, MD: 21-26, 169-170. WONG, KAU-FUI VINCENT. (2003). Intermediate Heat Transfer. Marcel Dekker, Inc. New York, NY. YAMAMOTO, Y.; MIYASHITA, M.; HUGHSON, R.; TAMURA, S.; SHINOHARA, M.; AND MUTOH, Y. (1991). The Ventilatory Threshold Gives Maximal Lactate Steady State. European Journal of Applied Physiology, 63: 55-59. YAMASHIRO, STANLEY M. (2006). Department of Biomedical Engineering, University of Southern California. Personal correspondence. 228 Appendix A: XRedDesign API Documentation A software package called XRedDesign was programmed for this work, which consists of an application- programming interface (API) that facilitates setting up experiments for design and analysis. This appendix contains documentation for the XRedDesign API generated from the commented source code using Dimitri van Heesch’s Doxygen program (http://www.doxygen.org). This provides a functional description of the interface, which the author hopes will encourage the use of this software by other researchers to their own research objectives when it is made available in open-source format. Index of XRedDesign API Base Classes CExperiment..............................................................................................................................................229 CParam ......................................................................................................................................................236 CExpModel ................................................................................................................................................237 CVarModel ................................................................................................................................................242 CPrior.........................................................................................................................................................245 CMesh.........................................................................................................................................................249 CInfoCriterion...........................................................................................................................................251 CSizeCriterion ...........................................................................................................................................255 CControlRegion.........................................................................................................................................258 CVector ......................................................................................................................................................260 CMatrix......................................................................................................................................................265 CMarkovChain..........................................................................................................................................270 arrayMC.....................................................................................................................................................273 optsFZ.........................................................................................................................................................274 optsMC.......................................................................................................................................................274 optsNM.......................................................................................................................................................275 optsRCr ......................................................................................................................................................276 statsOpt ......................................................................................................................................................277 Additional Functions in extras.h..............................................................................................................277 Author: Programmed by David J. Huber <huber@usc.edu> Date: Copyright 2003-2007 David J. Huber 229 CExperiment Class Reference #include <experiment.h> Public Member Functions Construction, Destruction, and Assignment • CExperiment () • CExperiment (const CExperiment &e) • CExperiment (CExpModel *mod, CVarModel *var, uint N=0) • virtual ~CExperiment () Initialization of the Experiment • void setModels (CExpModel *mod=NULL, CVarModel *var=NULL) • void setExpModel (CExpModel *mod) • void setVarModel (CVarModel *var) • void setSize (uint nSize) • void setX (const CMatrix &mX0) • void setMaxInput (const CVector &vXmax) • void setPrior (CPrior *prior, const CVector &vMean, const CMatrix &mCov) • void setTheta (const CVector &v) • void setTheta (uint i, const string &szData) • void setSigma (const CVector &v) • void setSigma (uint i, const string &szData) Access to Protected Members • CExpModel * getExpModel (void) const • CVarModel * getVarModel (void) const • uint getSize (void) const • const CMatrix & getX (void) const • uint getInputs (void) const • CPrior * getPrior (void) const • const CVector & getTheta (void) const • double getTheta (uint i) const • const CVector & getSigma (void) const • double getSigma (uint i) const • uint getP (void) const • uint getQ (void) const • uint getS (void) const Output of Results • void setFile (const string &szText) • const string & getFile (void) const • void setVerbose (bool bVerb) • bool getVerbose (void) const Experimental Actions • void designTrad (const CVector &vXmin, const CVector &vXmax, int nRepeat) • double designOpt (CInfoCriterion *info, uint nNodes, const optsRCr &opts, statsOpt *stats) 230 • CVector designSSD (CSizeCriterion *size, CControlRegion *roi, uint nChains, optsMC &opts, const CVector &vPtPrime) • double likelihood (const CParam &a, const CVector &vZ, const CMatrix &mXc) const • double info (CInfoCriterion *info, uint nNodes, const CMatrix &mX) const • void simulate (CVector &vZ, CVector &vAlpha=CVector(), bool bRand=true) const • CParam estimateMAP (const CVector &vZ, optsMC &opts, CMarkovChain &mcChain) • CParam estimateMLE (const CVector &vZ, const optsNM &opts, statsOpt *stats) const • void posterior (const CVector &vZ, const optsMC &opts, CMarkovChain &mcChain) Posterior Markov Chain Diagnostics • void ipsrf (const uint nChains, optsMC opts, arrayU pnLength, arrayD pdRrms, arrayD pdRmean, arrayD pdRstd) const • void ipsrf (const CVector &vZ, optsMC opts, arrayMC &pmcChain, arrayU &pnLength, arrayD &pdR, array2D &pdV, array2D &pdW) const • void mpsrf (const uint nChains, optsMC opts, arrayU &pnLength, arrayD &pdRrms, arrayD &pdRmean, arrayD &pdRstd) const • void mpsrf (const CVector &vZ, optsMC opts, arrayMC &pmcChain, arrayU &pnLength, arrayD &pdR, arrayD &pdV, arrayD &pdW) const Protected Attributes • CExpModel * m_pExpModel The experimental model that generates response predictions • CVarModel * m_pVarModel The variance model that describes the variance of the residual values • CPrior * m_pPrior Pointer to the prior distribution object • CVector m_vTheta Vector containing the known (fixed) parameters for the experiment • CVector m_vSigma Vector containing the variance parameters for the experiment • unsigned int m_nInputs Number of experimental inputs for model (i.e., M) • unsigned int m_nSize Number of observations in design (i.e., N) • unsigned int P Number of informative parameters (i.e., length of alpha) • unsigned int Q Number of fixed parameters (i.e., length of theta) • unsigned int S Number of variance parameters (i.e., length of sigma). 231 • CMatrix m_mX Design matrix for the experiment (N rows, M columns). • CVector m_vXmax Maximum allowable input value. • string m_szFile Indicates the filename to write the results. • bool m_bVerbose Indicates whether to show progress updates onscreen. Friends • std::ostream & operator << (std::ostream &os, const CExperiment &e) Detailed Description Bayesian Experimentation: Design, Sample Size Determination, and Analysis of Data This is the main class of the XRedDesign API that acts as the hub object of a Bayesian experiment. From this object, one can compute optimal and naive designs for the experiment, perform posterior diagnostics, and determine the optimal sample sizes using the precision-based criteria. Once the researcher has collected data, this object contains functions to estimate the parameters and credibility intervals using MLE or MAP. Constructor & Destructor Documentation CExperiment::CExperiment (CExpModel *mod, CVarModel *var, uint N = 0) Practical Constructor that allows the designation of experimental and variance models, and the sample size for the experiment. Parameters: mod Pointer to the experimental model class for the experiment var Pointer to the variance model class for the experiment N Number of observations in the experiment (optional) CExperiment Member Function Documentation void designTrad (const CVector &vXmin, const CVector &vXmax, int nRepeat) Computes the traditionally determined (naive) design for an experiment between boundaries 'vXmin' and 'vXmax'. The spacing of the design points is either linear or logarithmic, depending on the experimental domain constraint. The number of samples in the design is determined by the value of m_nSize (i.e., N). 232 Parameters: vXmin The minimum experimental input in the design vXmax The maximum experimental input in the design nRepeat Number of times to repeat each measurement (default = 1) double designOpt (CInfoCriterion *info, uint nNodes, const optsRCr &opts, statsOpt *stats) Computes the optimal design of an experiment using one of the optimal design criteria using the random creep global optimization algorithm. The optimal design for the experiment is automatically stored in 'm_mX'. Parameters: info A pointer to the information criterion class to use nNodes The number of nodes in the parameter mesh to compute the expectation opts The options for the random creep optimization algorithm (optional) stats The optimization statistics returned by the optimizer (optional) Returns: The optimal criterion value (information) at the optimal design CVector designSSD (CSizeCriterion *size, CControlRegion *roi, uint nChains, optsMC &opts, const CVector &vRho) Compute the SSD score for a given sample size using this experimental setup. The criterion for the size and region of interest are inputted as pointer arguments. The value of 'nChains' indicates the number of posterior predictive Markov chains to use. The values in 'vRho' set the control region sizes. Parameters: size A pointer to the SSD criterion class to use roi A pointer to the control region shape to use for Markov chain integration nChains The number of parallel posterior predictive Markov chains to use opts The options for generating the posterior Markov chain vRho A vector containing a series of values to integrate each chain over Returns: A vector containing the SSD score (control, risk, etc.) for each of the values in vRho double likelihood (const CParam &a, const CVector &vZ, const CMatrix &mXc) const Compute the likelihood of a parameter set given a data set, or a data set given a parameter set. The matrix mXc corresponds to the constrained design; this version of the function is useful for design and Markov chain generation. If the value of mXc is not included, the function uses a constrained version of m_mX. This input pattern is more useful for inference and parameter estimation. Parameters: a The set of parameters to test against vZ The data vector to test against mXc The CONSTRAINED design matrix (optional) 233 Returns: The likelihood value for the given combination of parameters, data, and design matrix double info (CInfoCriterion *info, uint nNodes, const CMatrix &mX) const Compute the information of an experiment at some design matrix based on the indicated information criterion. This is based on the prior distribution, parameters, and experimental and variance models defined in the experiment. If the value of mX is empty, use the design matrix defined by m_mX as the input. Parameters: info A pointer to the information criterion class to use nNodes The number of nodes in the parameter mesh to compute the expectation mX The design matrix to compute information at (optional) Returns: Experimental information provided by the experiment for the indicated criterion void simulate (CVector &vZ, CVector &vAlpha, bool bRand) const Simulate an experiment using the given parameter vector 'vAlpha'. If 'vAlpha' is undefined or NULL, the parameter vector is randomly generated from the prior and is returned through the pointer to 'vAlpha'. Parameters: vZ The data vector returned by the simulation vAlpha As input, defines the parameter vector to simulate from. As output, returns the parameter vector used for random simulation (optional). bRand If true, an alpha vector is randomly generated from the prior, which is equivalent to sampling from the preposterior distribution (default = true). CParam estimateMAP (const CVector &vZ, optsMC &opts, CMarkovChain &mcChain) const Compute the model and variance parameter estimates for each trial using the Bayesian maximum a posteriori probability method. This method computes a Markov chain reflecting the posterior distribution using the data and the likelihood function. The best point estimate for each parameter corresponds to the appropriate mode of the posterior distribution, which is determined by a combination of histogram and density estimation methods using kernels. Parameters: vZ The data vector to use when estimating the parameters opts The options for generating the posterior Markov chain mcChain The posterior Markov chain is returned through this parameter (optional) Returns: A CParam object containing the MAP estimate alongside the theta and sigma vectors 234 CParam estimateMLE (const CVector &vZ, const optsNM &opts, statsOpt *stats) const Compute the maximum likelihood estimate for the experiment by minimizing the negative log of the likelihood score. If the value of 'bEstVar' is true, then the variance parameters will also be estimated along with the alpha vector. This optimization is carried out using the Nelder-Mead simplex using the likelihood function as the objective function. A positivity constraint on the parameters is optionally imposed through the 'optsNM' object. Parameters: vZ The data vector to use when estimating the parameters opts The options for the Nelder-Mead optimization algorithm (optional) stats The optimization statistics returned by the optimizer (optional) Returns: A CParam object containing the MLE estimate alongside the theta and sigma vectors void posterior (const CVector &vZ, const optsMC &opts, CMarkovChain &mcChain) const Compute a posterior Markov chain using the current experimental framework given some data vector. Parameters: vZ The data vector to use when computing the Markov chain opts The options for generating the posterior Markov chain mcChain The posterior Markov chain is returned through this parameter void ipsrf (const uint nChains, optsMC opts, arrayU pnLength, arrayD pdRrms, arrayD pdRmean, arrayD pdRstd) const Compute the design version of the interval-based Iterative Potential Scale Reduction Factor (IPSRF) for the current experiment. This routine uses posterior predictive simulation to determine the minimum length of a Markov chain for any potential data set. Parameters: nChains The number of parallel MPSRF's required to detail preposterior convergence (i.e., K) opts The options for generating the posterior Markov chain pnLength Returns an array containing the various chain lengths for evaluation of convergence pdRrms Returns the RMS of the reduction factors at each of the chain lengths (optional) pdRmean Returns the mean of the reduction factors at each of the chain lengths (optional) pdRstd Returns the standard deviation of the reduction factors at each chain length (optional) void ipsrf (const CVector &vZ, optsMC opts, arrayMC &pmcChain, arrayU &pnLength, arrayD &pdR, array2D &pdV, array2D &pdW) const Compute the interval-based Iterative Potential Scale Reduction Factor (IPSRF) for the current set of Markov chains, given the data in 'vZ'. This routine is designed to determine the minimum length of a Markov chain for estimating the parameters. 235 Parameters: vZ The data vector used to generate the set of Markov chains opts The options for generating the posterior Markov chains pmcChain The set of parallel Markov chains is returned through this parameter pnLength Returns an array containing the various chain lengths for evaluation of convergence pdR Returns an array of reduction factors at each of the chain lengths pdV Returns an array of between-chain variances at each of the chain lengths (optional) pdW Returns an array of within-chain variances at each of the chain lengths (optional) void mpsrf (const uint nChains, optsMC opts, arrayU &pnLength, arrayD &pdRrms, arrayD &pdRmean, arrayD &pdRstd) const Compute the Multivariate Potential Scale Reduction Factor (MPSRF) for the experiment using posterior predictive simulation. This routine determines the minimum chain length to represent all possible posterior distributions. Parameters: nChains The number of parallel MPSRF's required to detail preposterior convergence (i.e., K) opts The options for generating the posterior Markov chain pnLength Returns an array containing the various chain lengths for evaluation of convergence pdRrms Returns the RMS of the reduction factors at each of the chain lengths (optional) pdRmean Returns the mean of the reduction factors at each of the chain lengths (optional) pdRstd Returns the standard deviation of the reduction factors at each chain length (optional) void mpsrf (const CVector &vZ, optsMC opts, arrayMC &pmcChain, arrayU &pnLength, arrayD &pdR, arrayD &pdV, arrayD &pdW) const Compute the Multivariate Potential Scale Reduction Factor (MPSRF) for the Markov chain for a series of subchains with lengths in multiples of 'nBatchLen'. The distribution of the Markov chain has converged when the value of pdR stabilizes at 1.0. This routine is designed to determine the minimum Markov chain length to estimate the parameters using the observation vector 'vZ'. Parameters: vZ The data vector used to generate the set of Markov chains opts The options for generating the posterior Markov chains pmcChain The set of parallel Markov chains is returned through this parameter pnLength Returns an array containing the various chain lengths for evaluation of convergence pdR Returns an array of reduction factors at each of the chain lengths pdV Returns an array of between-chain variances at each of the chain lengths (optional) pdW Returns an array of within-chain variances at each of the chain lengths (optional) The documentation for this class was generated from the following file: • C:/XRedDesign/experiment.h 236 CParam Class Reference #include <models.h> Public Member Functions Construction, Destruction, and Assignment • CParam () • CParam (const CVector &vA, const CVector &vT, const CVector &vS) • CParam (const CParam &) • virtual ~CParam () • CParam & operator = (const CParam &) Protected Member Access • const CVector & getAlpha (void) const • const CVector & getTheta (void) const • const CVector & getSigma (void) const • void setAlpha (uint i, double fVal) • void setTheta (uint i, double fVal) • void setSigma (uint i, double fVal) • void setAlpha (const CVector &vVec) • void setTheta (const CVector &vVec) • void setSigma (const CVector &vVec) Protected Attributes • CVector m_vAlpha Informative parameter vector. • CVector m_vTheta Known (fixed) parameter vector. • CVector m_vSigma Variance parameter vector. Friends • std::ostream & operator << (std::ostream &os, const CParam &a) Detailed Description Model Parameter container class This class provides a unified structure for all of the different parameter vectors for the evaluation of an experimental or variance model. The class consists of a series of vectors corresponding to the alpha, theta, and sigma parameters and is passed as a single argument into any of the model function 'eval' functions. The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 237 CExpModel Class Reference #include <models.h> Inherited by CAnaerobic, CExpDecay, CFlourescence, CLowPass, CRiseFall, and CSigmoid. Public Types • enum Constraint { NONE = 0, POSITIVE, LOG10 } The domain constraint for the experimental model input. Public Member Functions Construction and Destruction • CExpModel () • CExpModel (const CExpModel &mod) • virtual ~CExpModel () User Defined for Each Derived Class • virtual double eval (const CVector &vX, const CParam &a) const =0 • const string & name (void) const Return the friendly name of the experimental model. Protected Member Functions • double fzero (double(*fname)(double, const CParam &, const CVector &), const CVector &v, const CParam &a, const optsFZ &opts, statsOpt *stats=NULL) const Find the bounded zero of a function using Newton-Raphson (for open-form model functions). • double deriv (double(*fname)(double, const CParam &, const CVector &), double fX, const CParam &a, const CVector &v, double *pdFofX=NULL) const Compute the derivative of an objective function (used by CExperiment::fzero) Protected Attributes • string m_szName Friendly name of the specific experimental model. • unsigned int m_nInputs Dimension of the input stimulus for the model. • unsigned int m_nAlphas Number of informative parameters for the model (i.e., P). • unsigned int m_nThetas Number of known (fixed) parameters for the model (i.e., Q). • Constraint m_Constraint Domain constraint employed by the model. 238 Detailed Description Virtual base class for experimental models This class provides a base class for the experimental model from which a user can easily derive custom experimental models. For each of these derived classes, the user must define a default constructor, which assigns a name to the model (in string format) and defines the number of informative and fixed parameters that the model requires. In addition, the user must define a function 'eval', which relates the model prediction to the input stimulus and set of parameters. Constructor & Destructor Documentation CExpModel::CExpModel () Default constructor. This function must be defined by the user for all objects derived from the base class, and initializes the protected attributes using fixed values that correspond to the specific model. CExpModel Member Function Documentation virtual double eval (const CVector &vX, const CParam &a) const [pure virtual] Compute a system response prediction from an input stimulus and parameter set. This function must be defined by the user for all objects derived from the base class. Parameters: vX Input stimulus vector (length of M) a Parameter set to use for the prediction Returns: System response prediction The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 239 CExpDecay Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description The exponential decay model (M = 1, P = 1, Q = 1) used in trials 1:01 through 1:04 The documentation for these classes was generated from the following file: • C:/XRedDesign/models.h CRiseFall Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description The exponential rise-and-fall model (M = 1, P = 2, Q = 1) used in trials 2:01 through 2:04 The documentation for these classes was generated from the following file: • C:/XRedDesign/models.h CSigmoid Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description Four-parameter Hill sigmoid (M = 1, P = 3, Q = 1) used in trials 3:01 through 3:04 The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 240 CLowPass Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description Sallen-key low pass filter (M = 1, P = 2, Q = 2) The magnitude response function for the Sallen-Key low pass filter with a Butterworth response. This model assumes that the two input and gain resistor values are equal to each other, as are the two capacitors. Input is the frequency in Hertz (Hz). The informative parameters are the input and gain resistance values. The capacitor values are noninformative. The documentation for this class was generated from the following file: • C:/XRedDesign/models.h CFluorescence Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description Model for Indocyanine Green fluorescence in whole human blood (M = 1, P = 3, Q = 0) This model follows the rise-and-fall shape seen in the second simulated model. The ICG displays an optimal degree of fluorescence at a moderate concentration with lesser fluorescence at higher and lower concentrations. The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 241 CAnaerobic Class Reference #include <models.h> Inherits CExpModel. Public Member Functions • double eval (const CVector &vX, const CParam &a) const Detailed Description Dual-linear model for determining the anaerobic threshold (M = 1, P = 4, Q = 0) This model consists of a pair of straight lines that intersect at the value of the anaerobic threshold. The goal for this experiment is to determine the AT by estimating the slope and intercept parameters for the model and using these values to determine the point of intersection of the lines. Therefore, this model employs different versions of the MCMC algorithms and prior distributions in order to estimate the AT through the other parameters. The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 242 CVarModel Class Reference #include <models.h> Inherited by CConstVAR, CPowerVAR, and CQuadrVAR. Public Member Functions Construction and Destruction • CVarModel () • CVarModel (const CVarModel &var) • virtual ~CVarModel () User Defined for Each Derived Class • virtual double eval (double fY, const CParam &a) const =0 • const string & name (void) const Protected Attributes • string m_szName Friendly name of the specific variance model (assigned at construction). • unsigned int m_nSigmas Number of variance parameters required by the model (i.e., S). Detailed Description Virtual base class for variance models This class provides a base class for the variance model from which a user can easily derive custom variance models. For each of these derived classes, the user must define a default constructor, which assigns a name to the model (in string format) and defines the number of variance parameters that the model requires. In addition, the user must define a function 'eval', which relates the residual variance prediction to the model prediction and set of variance parameters. Constructor & Destructor Documentation CVarModel::CVarModel () Default constructor. This function must be defined by the user for all objects derived from the base class, and initializes the protected attributes using fixed values that correspond to the specific variance model. 243 CVarModel Member Function Documentation virtual double eval (double fY, const CParam &a) const [pure virtual] Compute a residual variance prediction from a model prediction, dY, and parameter set Parameters: fY System response prediction from experimental model a Parameter set to use for the prediction Returns: Residual variance prediction The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 244 ConstVAR Class Reference #include <models.h> Inherits CVarModel. Public Member Functions • double eval (double fY, const CParam &a) const • const string & name (void) const Detailed Description Constant Variance Model The documentation for this class was generated from the following file: • C:/XRedDesign/models.h CPowerVAR Class Reference #include <models.h> Inherits CVarModel. Public Member Functions • double eval (double fY, const CParam &a) const • const string & name (void) const Detailed Description Power Variance Model The documentation for this class was generated from the following file: • C:/XRedDesign/models.h CQuadrVAR Class Reference #include <models.h> Inherits CVarModel. Public Member Functions • double eval (double fY, const CParam &a) const • const string & name (void) const Detailed Description Quadratic (parabolic) Variance Model The documentation for this class was generated from the following file: • C:/XRedDesign/models.h 245 CPrior Class Reference #include <prior.h> Inherited by CLognormalPrior, CNormalPrior, and CUniformPrior. Public Member Functions Construction and Destruction • CPrior () • CPrior (const CVector &vMean, const CMatrix &mCov) • CPrior (const CPrior &pdf) • virtual ~CPrior () User Defined for Each Derived Class • virtual void rand (CVector &vRand) const =0 • virtual double pdf (const CVector &vX) const =0 • virtual void init (const CVector &vMean, const CMatrix &mCov)=0 • virtual void initMesh (CMesh &mesh, const arrayU &pnLen, bool bWarn=true) const =0 • virtual void initMesh (CMesh &mesh, uint nNodes, bool bWarn) const Create a mesh of size 'nNodes' with an equal number of samples for each parameter. Access to Protected Member Attributes • void setParams (const CVector &vMean, const CMatrix &mCov) • const CVector & getMean () const • const CMatrix & getCov () const • virtual const string & name (void) const =0 Protected Attributes • unsigned int P Number of parameters in prior distribution • CVector m_vMean Mean vector for the prior distribution • CMatrix m_mCov Covariance matrix for the prior distribution • string m_szName Friendly name of prior distribution Friends • std::ostream & operator << (std::ostream &os, const CPrior &pdf) Detailed Description Base class for defining multivariate prior distributions 246 This class provides a foundation from which a user can easily define new prior multivariate distribution shapes not already supplied. For each of these derived prior distribution classes, the user must define a random number generator ('rand') and probability density evaluator ('pdf') function. If either function routinely employs vectors or matrices derived from the prior mean and covariance, one should add new members to the derived class and assign these in the constructor to avoid repeated calculations in the evaluation of pdf and random vector generation. Constructor & Destructor Documentation CPrior::CPrior (const CVector &vMean, const CMatrix &mCov) Practical Constructor using prior mean and covariance. Parameters: vMean Vector corresponding to the prior mean mCov Matrix corresponding to the prior covariance CPrior Member Function Documentation virtual void rand (CVector &vRand) const [pure virtual] Generate a random vector from the prior distribution Parameters: vRand The random vector is returned through this parameter virtual double pdf (const CVector &vX) const [pure virtual] Compute the probability density function value of the prior distribution at some parameter value. This must be defined for each derived prior type. Parameters: vX The parameter vector at which to compute the pdf Returns: The value of the pdf for the specific distribution type virtual void init (const CVector &vMean, const CMatrix &mCov) [pure virtual] Initialize the prior distribution with a given mean and covariance. This function also initializes any of the secondary objects (vectors, matrices, etc.) that might be regularly used by the given prior. Parameters: vMean Vector corresponding to the prior mean mCov Matrix corresponding to the prior covariance 247 virtual void initMesh (CMesh &mesh, const arrayU &pnLen, bool bWarn) const [pure virtual] Generate a parameter mesh object from the current prior distribution. Parameters: mesh The parameter mesh is returned through this parameter pnLen An array containing the number of samples per parameter (i.e., prod(pnLen) = nNodes) bWarn If true, warning messages will be displayed (default = true) The documentation for this class was generated from the following file: • C:/XRedDesign/prior.h CNormalPrior Class Reference #include <prior.h> Inherits CPrior. Public Member Functions • void rand (CVector &vRand) const • double pdf (const CVector &vX) const • void init (const CVector &vMean, const CMatrix &mCov) • void initMesh (CMesh &mesh, const arrayU &pnLen, bool bWarn=true) const Protected Attributes • double m_fCovInvDet Determinant of the inverse of the covariance matrix (used by pdf). • CMatrix m_mCovInv Inverse of the covariance matrix (used by pdf). • CMatrix m_mCovChol Cholesky decomposition of covariance matrix (used by rand). Detailed Description Multivariate Normal Distribution The documentation for this class was generated from the following file: • C:/XRedDesign/prior.h 248 CLognormalPrior Class Reference #include <prior.h> Inherits CPrior. Public Member Functions • void rand (CVector &vRand) const • double pdf (const CVector &vX) const • void init (const CVector &vMean, const CMatrix &mCov) • void initMesh (CMesh &mesh, const arrayU &pnLen, bool bWarn=true) const Protected Attributes • CNormalPrior m_Nprior The normally distributed base prior (used by rand and pdf). Detailed Description Multivariate Lognormal Distribution The documentation for this class was generated from the following file: • C:/XRedDesign/prior.h CUniformPrior Class Reference #include <prior.h> Inherits CPrior. Public Member Functions • void rand (CVector &vRand) const • double pdf (const CVector &vX) const • void init (const CVector &vMean, const CMatrix &mCov) • void initMesh (CMesh &mesh, const arrayU &pnLen, bool bWarn=true) const Protected Attributes • CVector m_vMinVals • CVector m_vMaxVals Vectors of minimum and maximum values for each parameter. Detailed Description Multivariate Uniform Distribution The documentation for this class was generated from the following file: • C:/XRedDesign/prior.h 249 CMesh Class Reference #include <prior.h> Public Member Functions Construction and Destruction • CMesh () • CMesh (const arrayU &pnLen) • virtual ~CMesh () Access to Protected Attributes • uint getNumParams () const Returns the number of parameters in the prior distribution (dimension of the mesh). • uint getNumNodes () const Returns the number of nodes in the parameter mesh. • void getIndex (uint nNode, arrayU &pnIdx) const Returns the index of a given node through the parameter pnIdx. • double getWeight (uint nNode) const Returns the weight of a given node in the mesh. • void getNode (uint nNode, CVector &vAlpha) const Returns the parameter value of a given node through the parameter vAlpha. Protected Attributes • double ** m_pfNode List of parameter values whose combinations form the mesh. • double * m_pfWeight Weight of the corresponding node (sum = 1). • unsigned int * m_pnLength Array containing the number of samples per parameter. • unsigned int m_nParams Number of parameters represented. • unsigned int m_nNodes Total number of nodes in the mesh. Detailed Description The parameter mesh represents a discretized version of the prior distribution and allows the numerical calculation of the expectation operator in low dimensions. It is constructed by a given prior distribution object and generates a series of linearly or logarithmically spaced nodes throughout the parameter space. 250 The nodes are arranged linearly and represent the combinations of parameter values in the discretized prior distribution. Since representing the parameter combination at each node would require an excessive amount of memory space (e.g., a 100x100 mesh requires 20,000 f.p. values to store), these combinations are computed and assembled into a parameter vector on the fly using far less storage (e.g., 200 f.p. values to store the same mesh). The weight of each node is stored separately in an array. An expectation for a function is computed using the mesh by iterating through the nodes, evaluating the function at each parameter combination and multiplying this value by the weight at that node. When the mesh contains a sufficient number of nodes, the expectation is equal to the average of these values. CMesh Member Function Documentation double getWeight (uint nNode) const [inline] The weight of each node represents its probability in the discrete representation of the prior distribution. This is computed by multiplying the continuous pdf at the center of each node with the base area of the region between nodes (think of each node as the center of a tile on a tiled floor). This yields the hypervolume of the multidimensional pdf and represents the prior probability within that tile region, centered at the node. For lognormal priors, these tiles are logarithmically spaced (linear in normal space). All other priors employ linearly spaced nodes and tiles. Parameters: nNode The identifier of the specific node to access (0 through m_nNodes-1) Returns: The weight (discrete probability) of the numbered node void getNode (uint nNode, CVector &vAlpha) const This function allows the access of individual nodes in the parameter mesh so that an objective function can be evaluated. One can compute the expectation of a function by multiplying the functional value yielded by a node with the weight of a node and averaging, one can compute the expectation of the function over the probability distribution represented by the parameter mesh. Parameters: nNode The identifier of the specific node to access (0 through m_nNodes-1) vAlpha The specific parameter vector that corresponds to the node The documentation for this class was generated from the following file: • C:/XRedDesign/prior.h 251 CInfoCriterion Class Reference #include <info.h> Inherited by CInfoD, CInfoED, and CInfoEID. Public Member Functions Construction and Destruction • CInfoCriterion () • CInfoCriterion (const CInfoCriterion &info) • ~CInfoCriterion () User Defined for Each Derived Class • const string & name (void) const • virtual void build (const CExperiment &e, uint nMeshSize) • virtual double eval (const CMatrix &mX)=0 • void fisher (const CMatrix &mX, CMatrix &mFisher) Protected Attributes • string m_szName Friendly name of the specific experimental model. • CMesh m_Mesh A parameter mesh object used to compute the expectation over the prior. • CParam m_Param An object to store the theta and sigma vectors and evaluate different alphas. • CExpModel * m_pExpModel A copy of the experimental model pointer from the experiment. • CVarModel * m_pVarModel A copy of the variance model pointer from the experiment. Detailed Description Virtual base class for computing the information of an experiment This class provides a base class for the information criteria used for computing optimal designs from which a user can easily derive new optimal design criteria and measures of experimental information. Each derived class can contain additional objects and functions to facilitate the computation of the experimental information. For each of these derived classes, the user must define a function 'eval', which computes the information provided by the experiment at a given design matrix. 252 CInfoCriterion Member Function Documentation virtual void build (const CExperiment &e, uint nMeshSize) [virtual] Build the current information object from the experiment that created it. This function initializes the protected member attributes from the experiment and builds a parameter mesh with the given size from which to compute the expectation over the prior distribution. If the value of nMeshSize is a root of the number of parameters, then the mesh will contain an equal of samples from each parameter. Parameters: e The base experiment that created this object nMeshSize Number of nodes in the parameter mesh virtual double eval (const CMatrix &mX) [pure virtual] Compute the information provided by a given design matrix for the experiment using the current measure of experimental information. By optimizing over this value, one can compute the optimal design for the experiment. Parameters: mX The design matrix containing the input stimulus Returns: The information provided by the experiment using the given criterion void fisher (const CMatrix &mX, CMatrix &mFisher) Compute the fisher information matrix for the given input vector. The matrix multiplications and other low-level operations are unwound and computed explicitly to improve speed. This function plays a critical role in the D, ED and EID design criteria. Parameters: mX The design matrix containing the input stimulus mFisher The Fisher information matrix is returned through this parameter The documentation for this class was generated from the following file: • C:/XRedDesign/info.h 253 CInfoD Class Reference #include <info.h> Inherits CInfoCriterion. Public Member Functions • const string & name (void) const • virtual void build (const CExperiment &e, uint nMeshSize) • virtual double eval (const CMatrix &mX)=0 • void fisher (const CMatrix &mX, CMatrix &mFisher) Detailed Description Optimal design using the D-optimal criterion. Unlike the other optimal design criteria, this one does not employ the expectation over the prior distribution. Rather, this criterion evaluates the Fisher information matrix at the mean of the defined prior distribution and computes the criterion for the experiment as the reciprocal of its determinant. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h CInfoEID Class Reference #include <info.h> Inherits CInfoCriterion. Public Member Functions • const string & name (void) const • virtual void build (const CExperiment &e, uint nMeshSize) • virtual double eval (const CMatrix &mX)=0 • void fisher (const CMatrix &mX, CMatrix &mFisher) Detailed Description Optimal design using the EID-optimal criterion. This method extends the D-optimal design by evaluating the reciprocal of the determinant of the Fisher information matrix and computing the expectation over the prior distribution. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h 254 CInfoED Class Reference #include <info.h> Inherits CInfoCriterion. Public Member Functions • const string & name (void) const • virtual void build (const CExperiment &e, uint nMeshSize) • virtual double eval (const CMatrix &mX)=0 • void fisher (const CMatrix &mX, CMatrix &mFisher) Detailed Description Optimal design using the ED-optimal criterion. This criterion attempts to maximize the expectation of the determinant of the Fisher information matrix over the prior distribution. To accommodate the built-in optimization algorithm, which minimizes a criterion by default, the score produced by the ‘eval’ function is multiplied by -1. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h 255 CSizeCriterion Class Reference #include <info.h> Inherited by CSizeCombined, CSizeControl, and CSizeRisk. Public Member Functions Construction and Destruction • CSizeCriterion () • CSizeCriterion (const CSizeCriterion &size) • ~CSizeCriterion () User Defined for Each Derived Class • virtual const char * name (void) const =0 Return the friendly name of the optimal design criterion. • void init (CControlRegion *roi) Initializes the control region with the given shape. • virtual CVector eval (const CExperiment &e, uint nChains, optsMC opts, const CVector &vRho) Evaluate the SSD criterion for the given experiment and control region. Protected Attributes • CControlRegion * m_ROI Shape of the control region for Markov chain integration. Detailed Description Virtual base class for determining the optimal sample size of an experiment These functions allow one to perform precision-based sample size determination for the given experiment. Unlike the CInfoCriterion class, which computes the optimal design by maximizing the information provided by an experiment, this object computes the precision provided by the experiment at some user- defined design. By running the SSD procedure for an increasing range of sample sizes, the user can determine the optimal sample size as that which causes the precision to exceed some predetermined value. CSizeCriterion Member Function Documentation virtual CVector eval (const CExperiment &e, uint nChains, optsMC opts, const CVector &vRho) const [pure virtual] Evaluate the SSD criterion for the given experiment and control region. 256 The algorithm works by generating a set of K parallel posterior predictive Markov chains with data vectors drawn from the preposterior distribution, integrating each chain over the control region, and combining the results across the chains in some way. Using multiple values of rho allows the evaluation of the precision for multiple control regions without recomputing any of the Markov chains. Parameters: e The base experiment that created this criterion object nChains The number of parallel posterior predictive Markov chains (i.e., K) opts The options for generating the posterior Markov chain vRho A vector containing each of the values to evaluate the SSD criterion Returns: The SSD precision score according to the SSD criterion for each value of rho The documentation for this class was generated from the following file: • C:/XRedDesign/info.h CSizeCombined Class Reference #include <info.h> Inherits CSizeCriterion. Public Member Functions • const char * name (void) const • CVector eval (const CExperiment &e, uint nChains, optsMC opts, const CVector &vRho) Detailed Description Optimal sample size determination using both expected control and risk. Rather than computing the criterion in the 'eval' function call, this version writes the precision of each chain to a file, where one can examine the results more closely. The expected control and risk can be computed from this data. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h 257 CSizeControl Class Reference #include <info.h> Inherits CSizeCriterion. Public Member Functions • const char * name (void) const • CVector eval (const CExperiment &e, uint nChains, optsMC opts, const CVector &vRho) Detailed Description Optimal sample size determination using expected control. This involves computing K posterior predictive Markov chains, integrating the chain over the control region for each value of rho, and finally averaging over the K chains to obtain the expectation. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h CSizeRisk Class Reference #include <info.h> Inherits CSizeCriterion. Public Member Functions • const char * name (void) const • CVector eval (const CExperiment &e, uint nChains, optsMC opts, const CVector &vRho) Detailed Description Optimal sample size determination using risk. As with the previous SSD criterion, this computes a population of K posterior predictive Markov chains and integrates each over the control region for each value of rho. However, this criterion then counts the number of chains whose control exceeds 0.90 and determines the risk based on pt' values of 0.10 and 0.05. The documentation for this class was generated from the following file: • C:/XRedDesign/info.h 258 CControlRegion Class Reference #include <markovchain.h> Inherited by CROIEllipse, and CROIRectangle. Public Member Functions Construction and Destruction • CControlRegion () • CControlRegion (const CControlRegion &roi) • virtual ~CControlRegion () User Defined for Each Derived Class • virtual bool isInbounds (const CVector &v) const =0 Returns true if the data point in 'v' falls within the control region. • void setCenter (const CVector &vC) Set the center of the control region. • void setElongation (const CVector &vE) Set the elongation (radii) of the control region for each parameter. Protected Attributes • CVector m_vCenter The center of the control region. • CVector m_vElongation The elongation from center in each direction (i.e., the "radii" of the region). Detailed Description Virtual base class for the region of Markov chain integration This class provides a foundation to construct control regions of various sizes. In general, a control region is centered at some value and extends out in each of the P parameter directions for some elongation/radius value. A point in the parameter space is considered "in bounds" if it falls within the boundary of the control region. A user can define control regions of different shapes by creating a new class that inherits this base class and defining the function 'isInBounds' to define the boundary of the control region. These additional classes can employ additional shape parameters besides the center and elongation. The documentation for this class was generated from the following file: • C:/XRedDesign/markovchain.h 259 CROIEllipse Class Reference #include <markovchain.h> Inherits CControlRegion. Public Member Functions • bool isInbounds (const CVector &vX) const Detailed Description Elliptical Control Region for Markov Chain Integration The documentation for this class was generated from the following file: • C:/XRedDesign/markovchain.h CROIRectangle Class Reference #include <markovchain.h> Inherits CControlRegion. Public Member Functions • bool isInbounds (const CVector &vX) const Detailed Description Rectangular Control Region for Markov Chain Integration The documentation for this class was generated from the following file: • C:/XRedDesign/markovchain.h 260 CVector Class Reference #include <matrix.h> Public Member Functions Construction, Destruction and Assignment • CVector () • CVector (uint nLength) Practical Constructor for a vector with a specific length. • CVector (uint nLen, const string &szData) Practical Constructor that creates a vector from MATLAB string format (i.e., "1; 2; 3"). • CVector (const CVector &v) • virtual ~CVector () • CVector & operator = (const CVector &v) Vector-Wise Statistical Functions • double norm (void) const Compute the norm (magnitude) of the vector. • double mean (void) const Compute the mean vector along the vector. • double median (void) const Compute the median value of the vector. • double stddev (void) const Compute the standard deviation vector along the vector. • double variance (void) const Compute the variance vector along the vector. • double rms (void) const Compute the root mean square (RMS) along the vector. • double percentile (double fPrcTile) const Compute the given percentile of the vector data. • CVector percentile (const CVector &vPrcTile) const Compute the given percentile values for each value in 'vPrcTile' of the vector data. • double sum (void) const Compute the sum of the elements in the vector. • double product (void) const Compute the product of the elements in the vector. • CVector autocorr (uint nMaxLags=0) const Compute the autocorrelation of the current vector. 261 Data Manipulation • uint getLength (void) const Returns the length of the vector. • const double & operator[] (uint i) const • double & operator[] (uint i) Get or set the data value of a given element of the vector using array notation (i.e., X[i]). • void reinit (uint nLen, const string &szData) Reinitialize the vector. This resizes the vector and fills its elements with the string 'szData'. • void resize (uint nLen, bool bPreserve=false) Resize the vector. If bPreserve is true, the data in the vector is preserved. • void shiftDown (uint nShift) Shift all elements in the vector down by 'nShift' elements. • void shiftUp (uint nShift) Shift all elements in the vector up by 'nShift' elements. • void sort (arrayU &pnIdx) Sort the vector elements in the order dictated by 'pnIdx'. • void sort (void) Perform a QUICKSORT to arrange the vector elements in ascending order. • void reverse (void) Reverse the order of the vector elements. • void zero (void) Assign all vector element values to zero. • CMatrix makeColMatrix (void) const Transform the vector into a column matrix with rows equal to the length. • CMatrix makeRowMatrix (void) const Transform the vector into a row matrix with columns equal to the length. • CMatrix diagonalize (void) const Create a square matrix whose diagonal elements correspond to the vector values. Boolean Tests • bool isEmpty (void) const Returns true if the vector has zero length. • bool isZero (void) const Returns true if the vector contains only zeroes. 262 Protected Attributes • unsigned int m_nLength Length of the vector • double * m_pfData Memory space for the values of the vector Friends Element-by-Element Arithmetic Operators • CVector operator * (const CVector &v1, const CVector &v2) • CVector operator * (const CVector &v, double f) • CVector operator * (const CVector &v, int n) • CVector operator * (double f, const CVector &v) • CVector operator * (int n, const CVector &v) • void operator *= (CVector &v, const CVector &v2) • void operator *= (CVector &v, double f) • void operator *= (CVector &v, int n) • CVector operator + (const CVector &v1, const CVector &v2) • CVector operator + (const CVector &v, double f) • CVector operator + (const CVector &v, int n) • CVector operator + (double f, const CVector &v) • CVector operator + (int n, const CVector &v) • void operator += (CVector &v, const CVector &v2) • void operator += (CVector &v, double f) • void operator += (CVector &v, int n) • CVector operator - (const CVector &v1, const CVector &v2) • CVector operator - (const CVector &v, double f) • CVector operator - (const CVector &v, int n) • CVector operator - (double f, const CVector &v) • CVector operator - (int n, const CVector &v) • CVector operator - (const CVector &v) • void operator -= (CVector &v, const CVector &v2) • void operator -= (CVector &v, double f) • void operator -= (CVector &v, int n) • CVector operator / (const CVector &v1, const CVector &v2) • CVector operator / (const CVector &v, double f) • CVector operator / (const CVector &v, int n) • CVector operator / (double f, const CVector &v) • CVector operator / (int n, const CVector &v) • void operator /= (CVector &v, const CVector &v2) • void operator /= (CVector &v, double f) • void operator /= (CVector &v, int n) 263 Per-Element (Scalar) Algebraic Operators • CVector fabs (const CVector &v) Compute the f.p.-absolute value (fabs) on every element in the vector. • CVector sqrt (const CVector &v) Compute the square root of each element of the vector. • CVector log (const CVector &v) Compute the natural logarithm of each element in the vector. • CVector exp (const CVector &v) Compute the exponential of each element in the vector. • CVector sq (const CVector &v) Compute the square (x^2) of each element of the vector. • CVector pow10 (const CVector &v) Compute 10^x of each element in the vector. • CVector log10 (const CVector &v) Compute the common (base-10) logarithm of each element in the vector. • CVector round (const CVector &v) Round each element of the vector to the nearest integer value. • CVector pow (const CVector &v, const double p) Raise each element of the vector to the power, p. • double max (const CVector &v, int *nIdx=NULL) Returns the largest value in the vector and the index (through pnIdx). • double min (const CVector &v, int *nIdx=NULL) Returns the smallest value in the vector and the index (through pnIdx). • CVector xcorr (const CVector &v1, const CVector &v2, uint nMaxLags) Compute the cross-correlation of a pair of vectors (nMaxLags defaults to length of vectors). Operators for Combining Multiple Vectors • CVector combine (const CVector &v1, const CVector &v2) Combine two vectors into a single, long vector. • CVector combine (const std::vector< CVector > &pvSubVectors) Combine an array of vectors into a single, long vector. IO Stream and Boolean Operators • std::ostream & operator << (std::ostream &os, const CVector &v) ostream operator for printing out a matrix 264 • std::istream & operator >> (std::istream &is, CVector &v) istream operator for reading in a matrix from a string (i.e., "1; 2; 3") • bool operator == (const CVector &v1, const CVector &v2) Returns true if the two vectors are identical in dimension and data. • bool operator != (const CVector &v1, const CVector &v2) Returns true if the two vectors are different in dimension or data. Detailed Description Linear data storage and floating-point data processing, sorting, and statistics This generic vector class allows the linear storage of data and the computation of linear algebraic operations. Unlike the built-in C++ standard 'vector' template object, this class integrates very smoothly into the CMatrix object and contains a number of essential data processing and sorting operators. The documentation for this class was generated from the following file: • C:/XRedDesign/matrix.h 265 CMatrix Class Reference #include <matrix.h> Public Member Functions Construction, Destruction, and Assignment • CMatrix () • CMatrix (uint nRows, uint nColumns) Practical Constructor for a ZERO matrix with a specific number of rows and columns. • CMatrix (uint nRows, uint nColumns, const string &szData) Practical Constructor that accepts MATLAB string format (i.e., "1 2 3; 4 5 6; 7 8 9"). • CMatrix (const CMatrix &m) • virtual ~CMatrix () • CMatrix & operator = (const CMatrix &m) Linear Algebraic Functions • CVector diag (void) const Return a vector composed of the diagonal elements of the matrix. • double trace (void) const Compute the trace of the matrix (the sum of the diagonal elements). • CMatrix transpose (void) const Compute the transpose the matrix (Mij --> Mji). • CMatrix inverse (void) const Compute the inverse of the matrix (i.e., X*inv(X) = I). • float det (void) const Compute the determinant of the matrix. • CMatrix cholesky (bool bWarn=true) const Compute the Cholesky decomposition of the matrix (i.e., if R = chol(X) then R'*R = X). • double ludecomp (CMatrix *mL, CMatrix *mU, CMatrix *mP, bool bWarn=true) const Compute the LU decomposition of the matrix (i.e., L*U = P*X). • void eigen (CVector &vEigVal, CMatrix &mEigVec) const Compute the eigenvectors and eigenvalues of the matrix. • double maxEigVal (void) const Compute the largest eigenvalue of the matrix (much faster than using 'eigen'). Row- or Column-Wise Statistical Functions • CVector mean (int nDim=0) const Compute the mean vector along the matrix rows (nDim=0) or columns (nDim=1). 266 • CVector stddev (int nDim=0) const Compute the std deviation vector along the matrix rows (nDim=0) or columns (nDim=1). • CVector variance (int nDim=0) const Compute the variance vector along the matrix rows (nDim=0) or columns (nDim=1). • CVector rms (int nDim=0) const Compute the root mean square along the matrix rows (nDim=0) or columns (nDim=1). Data Manipulation • uint getRows (void) const Returns the number of rows in the matrix. • uint getCols (void) const Returns the number of columns in the matrix. • void setType (MatType type) Set the type of the matrix, MatType is either ZERO or IDENTITY. • const double * operator[] (uint i) const • double * operator[] (uint i) Get or set the data value of a given matrix element using array notation (i.e., X[i][j]). • void reinit (uint nRows, uint nColumns, string szData) Reinitialize the matrix by resizing and filling its elements with the string 'szData'. • void resize (uint nRow, uint nColumns, bool bPreserve=false) Resize the matrix. If bPreserve is true, the data in the matrix is preserved. • CVector col2vector (uint nCol) const Extracts a column from the matrix and returns as a vector object. • CVector row2vector (uint nRow) const Extracts a row from the matrix and returns as a vector object. • void vector2col (uint nCol, const CVector &v) Replaces a column in the matrix with a vector object • void vector2row (uint nRow, const CVector &v) Replaces a row in the matrix with a vector object • void reorderRows (arrayU &pnIdx) Reorder the rows in the matrix according to the listing in pnIdx. • void reorderCols (arrayU &pnIdx) Reorder the columns in the matrix according to the listing in pnIdx. • void swapRows (uint A, uint B) Exchange the values between two rows in the matrix. 267 • void swapCols (uint A, uint B) Exchange the values between two columns in the matrix. Boolean Tests • bool isEmpty (void) const Returns true if the matrix has no dimension (i.e., zero rows and zero columns). • bool isZero (void) const Returns true if the matrix contains only zeroes. • bool isSquare (uint N=0) const Returns true if the matrix is square (i.e., number of rows == number of columns). • bool isTall (void) const Returns true if the matrix is taller than wide (i.e., number of rows > number of columns). • bool isWide (void) const Returns true if the matrix is wider than tall (i.e., number of rows < number of columns). • bool isSymmetric (void) const Returns true if the matrix is symmetric (i.e., X = X'). • bool isDiag (void) const Returns true if all off-diagonal elements in the matrix are zero. Protected Attributes • unsigned int m_nRows Number of rows in the matrix • unsigned int m_nColumns Number of columns in the matrix • double ** m_pfData Memory space for the values of the matrix elements Friends Full Matrix and Matrix-Scalar Arithmetic Operators • CMatrix operator * (const CMatrix &m1, const CMatrix &m2) • CMatrix operator * (const CMatrix &m, double f) • CMatrix operator * (const CMatrix &m, int n) • CMatrix operator * (double f, const CMatrix &m) • CMatrix operator * (int n, const CMatrix &m) • void operator *= (CMatrix &m, const CMatrix &m2) • void operator *= (CMatrix &m, double f) • void operator *= (CMatrix &m, int n) 268 • CMatrix operator + (const CMatrix &m1, const CMatrix &m2) • CMatrix operator + (const CMatrix &m, double f) • CMatrix operator + (const CMatrix &m, int n) • CMatrix operator + (double f, const CMatrix &m) • CMatrix operator + (int n, const CMatrix &m) • void operator += (CMatrix &m, const CMatrix &m2) • void operator += (CMatrix &m, double f) • void operator += (CMatrix &m, int n) • CMatrix operator - (const CMatrix &m1, const CMatrix &m2) • CMatrix operator - (const CMatrix &m) • CMatrix operator - (const CMatrix &m, double f) • CMatrix operator - (const CMatrix &m, int n) • CMatrix operator - (double f, const CMatrix &m) • CMatrix operator - (int n, const CMatrix &m) • void operator -= (CMatrix &m, const CMatrix &m2) • void operator -= (CMatrix &m, double f) • void operator -= (CMatrix &m, int n) • CMatrix operator / (const CMatrix &m, float f) • CMatrix operator / (const CMatrix &m, int n) • CMatrix operator / (double f, const CMatrix &m) • CMatrix operator / (int n, const CMatrix &m) • CMatrix operator / (const CMatrix &m1, const CMatrix &m2) • void operator /= (CMatrix &m, double f) • void operator /= (CMatrix &m, int n) Per-element (scalar) Algebraic Operators • CMatrix fabs (const CMatrix &m) Compute the f.p.-absolute value (fabs) on every element in the matrix. • CMatrix log (const CMatrix &m) Compute the natural logarithm of each element in the matrix. • CMatrix log10 (const CMatrix &m) Compute the common (base-10) logarithm of each element in the matrix. • CMatrix pow10 (const CMatrix &m) Compute 10^x of each element in the matrix. • CMatrix exp (const CMatrix &m) Compute the exponential of each element in the matrix. • CMatrix sqrt (const CMatrix &m) Compute the square root of each element of the matrix. • CMatrix sq (const CMatrix &m) Compute the square (x^2) of each element of the matrix. 269 • CMatrix round (const CMatrix &m) Round each element of the matrix to the nearest integer value. • double max (const CMatrix &m, int pnIdx[2]=NULL) Returns the largest value in the matrix and the index (through pnIdx). • double min (const CMatrix &m, int pnIdx[2]=NULL) Returns the smallest value in the matrix and the index (through pnIdx). IO Stream and Boolean Operators • std::ostream & operator << (std::ostream &os, const CMatrix &m) ostream operator for printing out a matrix in row-column format • std::istream & operator >> (std::istream &is, CMatrix &m) istream operator for reading in a matrix from a string (i.e., "1 2 3; 4 5 6; 7 8 9") • bool operator == (const CMatrix &m1, const CMatrix &m2) Returns true if the two matrices are identical in dimension and data. • bool operator != (const CMatrix &m1, const CMatrix &m2) Returns true if the two matrices are different in dimension or data. Detailed Description Two-dimensional floating-point data storage and linear algebra/statistics This generic matrix class allows the storage of two-dimensional data and the computation of linear algebraic operations. This class contains a number of operators for the arithmetic combination of matrices with other matrices, vectors, and scalar quantities, as well as unary operators that allow the manipulation of the data within the matrix. This class also contains various statistical functions for analyzing the data along the rows or columns. The documentation for this class was generated from the following file: • C:/XRedDesign/matrix.h 270 CMarkovChain Class Reference #include <markovchain.h> Public Member Functions Construction, Destruction, and Assignment • CMarkovChain () • CMarkovChain (uint nDimension, uint nLength) • CMarkovChain (const CMarkovChain &mc) • ~CMarkovChain () • CMarkovChain & operator = (const CMarkovChain &mc) Data Manipulation • uint getLength (void) const • uint getDimension (void) const • const CVector & getZ (void) const • const CVector & getLink (uint i) const Returns the i th element/link of the Markov chain • void resize (uint nDim, uint nLength) Change the size of the Markov chain in length and dimension. • void grow (const CExperiment &e, const CVector &vZ, const optsMC &opts) • void growToPrior (const CExperiment &e, const optsMC &opts) • double integrateROI (const CControlRegion &roi) const Statistical Analysis of Markov Chain • double kde (const CVector &vX, const CMatrix &mInvCov, double fWindow) const • CVector mode (const optsNM &opts, statsOpt *stats) const • CMatrix cred (const float fSig=0.95F) const • void acorr (uint nParam, arrayU *pnLags, arrayD *pdCorr, bool bNorm=true) const • void acorr (arrayU *pnLags, arrayD *pdCorr, bool bNorm=true) const • void acorr (arrayU *pnLags, array2D *pdCorr, bool bNorm=true) const • CVector mean (void) const Compute the mean of the Markov chain data. • CVector median (void) const Compute the median of the Markov chain data. • CMatrix cov (void) const Compute the covariance matrix of the Markov chain data. • CMatrix precision (const CVector &vEstim) const Compute the precision of the chain (covariance about the value of vEstim). • CMatrix percentile (const CVector &vPtile) const • CVector percentile (const double fPtile) const Compute a given percentile of the Markov chain, or many percentile values at once. 271 Protected Attributes • CVector * m_pvData The Markov chain data. • unsigned int m_nLength The length of the Markov chain. • unsigned int m_nDimension The dimension of the Markov chain (i.e., P). • CVector m_vZ The observation vector that generated the current chain. Detailed Description Class for the generation, storage, access, and processing of Markov chain data This is a data object designed to generate and store a Markov chain. The data is structured as an array of vectors that represents a series of random samples from the posterior distribution. The class contains functions to create a fill a Markov chain object with values that reflect either a prior or a posterior distribution. In addition, CMarkovChain includes several statistical functions to analyze the composure of the chain, including finding the measures of central tendency and dispersion, chain integration to determine precision, percentiles, and credibility intervals. Member Function Documentation void grow (const CExperiment &e, const CVector &vZ, const optsMC &opts) Grow the Markov chain corresponding to the posterior distribution for the designated experiment using the Independence Metropolis-Hastings algorithm. Parameters: e The experiment object that generates the posterior distribution vZ The data vector to use in the likelihood function opts The options for generating the posterior Markov chain void growToPrior (const CExperiment &e, const optsMC &opts) Grows a Markov chain object that accepts every proposed point, giving it the shape of the prior distribution. While not technically a Markov chain, since it does not exhibit the Markov property, such an easily predictable random sequence is very useful while debugging and testing algorithms that employ the Markov chain object. 272 Parameters: e The experiment object that generates the posterior distribution opts The options for generating the posterior Markov chain double integrateROI (const CControlRegion &roi) const Integrate the posterior Markov chain over a designated control region by determining the fraction of chain "links" that fall within the region. Parameters: roi The control region over which to integrate the Markov chain Returns: The fraction of the chain that falls within the control region double kde (const CVector &vX, const CMatrix &mInvCov, double fWindow) const Compute the multivariate pdf estimate of the posterior distribution from the Markov chain data using rescaled and transformed normal kernels to eliminate the need for multiple window widths. This function is called recursively by 'mode' in order to find the maximum of the posterior pdf. Parameters: vX The point in parameter space at which to estimate the posterior pdf mInvCov The inverse covariance of the Markov chain, used for scaling the window fWindow The base window width, h Returns: The pdf estimate at parameter 'vX' CVector mode (const optsNM &opts, statsOpt *stats) const Compute the mode of the Markov chain by finding the maximum of the kernel density estimate of the chain data. The optimization uses the Nelder-Mead algorithm. Parameters: opts The options for the Nelder-Mead optimization algorithm (optional) stats The optimization statistics returned by the optimizer (optional) Returns: A vector that corresponds to the mode of the Markov chain CMatrix cred (const float fSig = 0.95F) const Compute the credibility interval of a Markov chain for each parameter individually. Parameters: fSig The significance value of the credibility interval Returns: A P-by-2 matrix that contains the lower and upper credibility boundaries for each parameter 273 void acorr (uint nParam, arrayU *pnLags, arrayD *pdCorr, bool bNorm = true) const Compute the positive zero-lag-normalized autocorrelation of the Markov chain for the indicated dimension, or all dimensions simultaneously using different input/output parameter types. Parameters: nParam The dimension of the parameter space to analyze pnLags Returns an array containing the lag count for each value in 'pdCorr' pdCorr Returns the autocorrelation at each lag for one or all of the parameters bNorm If true, normalize the autocorrelation to be equal to one at zero lag (default = true) The documentation for this class was generated from the following file: • C:/XRedDesign/markovchain.h arrayMC Struct Reference #include <markovchain.h> Public Member Functions • void combine (CMarkovChain &mcChain) const Combine all of the Markov chains in the array into a single, long chain. Protected Attributes • CMarkovChain * m_pmcData The Markov chain data • unsigned int m_nSize The number of Markov chains in the array (i.e., K). Detailed Description Class to handle an array of multiple CMarkovChain objects. This allows easy management and storage of groups of parallel Markov chains, which is critical to the algorithms in the XRedDesign API. The documentation for this struct was generated from the following file: • C:/XRedDesign/markovchain.h 274 optsFZ Struct Reference #include <models.h> Public Attributes • double fBoundsLow The lower bound of the bracketed region. • double fBoundsHigh The upper bound of the bracketed region. • double fTolX The maximum tolerance required to determine convergence. • unsigned int nMaxIt The maximum number of iterations. Detailed Description Options inputted into the Newton-Raphson root finder This structure provides an input to the Newton-Raphson algorithm used to find the zeroes of an objective function. The upper and lower boundaries of the bracketed region must have functional values with different signs, so that only one zero lies between them. The documentation for this struct was generated from the following file: • C:/XRedDesign/models.h optsMC Struct Reference #include <markovchain.h> Public Attributes • unsigned int nBurnIn Length of burn-in segment of chain (discarded data). • unsigned int nLength Length of usable portion of chain. • CVector vStart Vector corresponding to the starting point of the chain. 275 Detailed Description Structure containing options for Markov chain generation. Each function in the XRedDesign API that generates Markov chains accepts this structure as an input parameter, which designates the length of the burn-in period of the chain, the total length of the chain, and the value that starts the Markov chain. The documentation for this struct was generated from the following file: • C:/XRedDesign/markovchain.h optsNM Struct Reference #include <experiment.h> Public Attributes • double fTolX Starting condition for the optimization. • double fTolF Maximum input tolerance to be convergent. • bool bEstVar Maximum ouput tolerance to be convergent. • bool bPosCon 'true' to estimate the variance parameters • uint nMaxIt positivity constraint for the parameter estimation Detailed Description Options inputted into the Nelder-Mead function for local optimization. The Nelder-mead simplex algorithm used in maximum likelihood estimation and mode finding accepts an object of this type as one of its inputs. This includes simplex tuning parameters such as the starting value for the search, maximum number of iterations to perform, and maximum radius to determine the convergence of the simplex. The user can change each of these values by creating an optsNM object, changing the appropriate value, and passing the new object into the optimization algorithm. The documentation for this struct was generated from the following file: • C:/XRedDesign/experiment.h 276 optsRCr Struct Reference #include <experiment.h> Public Attributes • CVector vStart Starting condition for the optimization • double fSmax Largest stepsize to search before switching to inward search • double fXs Factor to increase the stepsize if search fails • uint nM Length of each random walk before expanding step • double fTolX Smallest stepsize to search before terminating the optimization • double fTolF Percent change in score to be considered significant • uint nMaxIt Maximum number of iterations allowed Detailed Description Options inputted into the RCreep function for optimal design determination The random creep global optimization algorithm accepts an object of this structure as one of its inputs. This sets the optimization parameters such as the starting value, number of random steps per step, and the maximum and minimum search radii. The user can change each of these values by creating an optsRCr object, changing the appropriate value, and passing the new object into the optimization algorithm. The documentation for this struct was generated from the following file: • C:/XRedDesign/experiment.h 277 statsOpt Struct Reference #include <experiment.h> Public Attributes • CVector vStart Starting value for the optimization • double fYini Initial function value • double fYfin Final (optimized) function value • double fTime Time to find solution (in minutes) • uint nFcall Number of function calls made during the optimization • bool bMaxIt Set to ‘true’ if maximum number of iterations made, ‘false’ otherwise Detailed Description Statistics returned from an optimization function Each local or global optimization function returns an object of this structure. It records all of the statistics of the optimization process including the number of function evaluations used, time elapsed, and initial and final criterion values. This can be useful for benchmarking and troubleshooting various optimization algorithms. The documentation for this struct was generated from the following file: • C:/XRedDesign/experiment.h 278 C:/XRedDesign/extras.h File Reference Typedefs • typedef unsigned int uint • typedef std::vector< double > arrayD • typedef std::vector< int > arrayN • typedef std::vector< uint > arrayU • typedef std::vector< arrayD > array2D • typedef std::vector< arrayN > array2N • typedef std::vector< arrayU > array2U Global Functions • double randu (double A, double B) Generates uniform deviates between A and B • double randn (double fMean, double fStdDev) Generates normal deviates with mean and standard deviation using Box-Muller method • double pow10 (const double P) Returns the values of 10^P (inverse of log10) • double sq (const double P) Returns the value of P^2 Define Documentation #define ERROR(msg) Value:{ fprintf(stderr, "\nERROR in %s at line %d:\n%s: %s\n", \ getFileName(__FILE__).c_str(), __LINE__, __PRETTY_FUNCTION__, msg); exit(1); } Displays that an error has occurred and terminates the program with error code 1. Shows the file name, line, and class/function that caused the error #define INFO(msg) Value:{ fprintf(stderr, "\n%s \n", msg); } Displays a generic, developer-defined message to the user #define WARNING(msg) Value:{ fprintf(stderr, "\nWARNING from %s at line %d:\n%s: %s\n", \ getFileName(__FILE__).c_str(), __LINE__, __PRETTY_FUNCTION__, msg); } Displays a warning to the user that shows the file name, line of code, and class/function that caused the error 279 Appendix B: XRedDesign Software Validation Routines and Results To ensure the accuracy of the algorithms in the XRedDesign API, this work subjected the software to a battery of validation trials. Section 5.4 outlined the general validation procedure; this appendix provides the specific tests and their results. For each test, the control results are displayed in black ink, while the XRedDesign program outputs are listed in red ink. The author hopes that such a thorough validation procedure will further edify the results presented in this dissertation and inspire the use of this software in the scientific community. B.1 Validation of Random Number Generator and Prior Distribution Classes 1.1 Normal Prior: ( ) ( ) 0.75 ; 0.1117 μ=Σ= x = [0.50] PDF = 0.9024 PDF = 0.9024 x = [-0.25] PDF = 0.0136 PDF = 0.0136 x = [0.23] PDF = 0.3558 PDF = 0.3558 x = [-0.75] PDF = 5.0477e-5 PDF = 5.0447e-5 Random Sample Statistics: ( ) ( ) 0.7505 ; 0.1119 μ=Σ= 1.2 Normal Prior: 0.6886 2.1469 0.4877 1.6357 0.7655 ; 0.4877 5.7522 0.4520 0.8377 1.6357 0.4520 2.8427 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.3248; 1.5220; 0.5823] PDF = 0.0107 PDF = 0.0107 x = [0.3248; 2.5220; 1.5823] PDF = 0.0076 PDF = 0.0076 x = [0.3248; 0.5220; 0.0823] PDF = 0.0130 PDF = 0.0130 x = [-1.3248; 0.5220; -05823] PDF = 0.0056 PDF = 0.0056 Random Sample Statistics: 0.6874 2.1517 0.4946 0.8307 0.7648 ; 0.4946 5.7590 0.4478 0.8307 0.8307 0.4478 2.8500 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 1.3 Normal Prior: 1.1326 1.0703 0.0496 ; 1.1685 0.0496 2.9062 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 280 x = [0.2279; 0.1720] PDF = 0.0526 PDF = 0.0526 x = [1.2279; -0.1720] PDF = 0.0658 PDF = 0.0658 x = [3.2279; 2.1720] PDF = 0.0101 PDF = 0.0101 x = [-1.2279; 1.1720] PDF = 0.0067 PDF = 0.0067 Random Sample Statistics: 1.1348 1.0709 0.0509 ; 1.1687 0.0509 2.9011 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 1.4 Lognormal Prior: ( ) ( ) 0.60 ; 0.0117 μ=Σ= x = [0.50] PDF = 2.8950 PDF = 2.8950 x = [3.25] PDF = 1.2261E-20 PDF = 1.23E-20 x = [1.53] PDF = 1.0218E-6 PDF = 1.022E-6 x = [0.45] PDF = 1.5634 PDF = 1.5634 Random Sample Statistics: ( ) ( ) 0.5999 ; 0.0117 μ=Σ= 1.5 Lognormal Prior: 0.6886 2.1469 0.4877 1.6357 0.7655 ; 0.4877 5.7522 0.4520 0.8327 1.6357 0.4520 2.8427 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.3248; 1.5220; 0.5823] PDF = 0.0080 PDF = 0.0080 x = [0.3248; 2.5220; 1.5823] PDF = 0.0019 PDF = 0.0016 x = [0.3248; 0.5220; 0.0823] PDF = 0.2954 PDF = 0.2954 x = [1.3248; 0.5220; 0.5823] PDF = 0.0345 PDF = 0.0345 Random Sample Statistics: 0.6867 2.0233 0.4700 1.5927 0.7687 ; 0.4700 6.4466 0.4305 0.8325 1.5927 0.4365 2.9211 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 1.6 Lognormal Prior: 1.1326 1.0703 0.0496 ; 1.1685 0.0496 2.9062 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [0.2279; 0.1720] PDF = 0.5996 PDF = 0.5996 x = [2.2279; 0.1720] PDF = 0.0952 PDF = 0.0952 x = [0.2279; 1.1720] PDF = 0.1477 PDF = 0.1477 x = [1.2279; 2.1720] PDF = 0.0350 PDF = 0.0350 Random Sample Statistics: 1.1308 1.0571 0.0508 ; 1.1688 0.0508 2.9784 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 281 1.7 Uniform Prior: () ( ) 0.75 ; 0.0117 μ=Σ= x = [0.50] PDF = 0.0000 PDF = 0.0000 x = [0.65] PDF = 2.6688 PDF = 2.6688 x = [1.53] PDF = 0.0000 PDF = 0.0000 x = [0.85] PDF = 2.6688 PDF = 2.6688 Random Sample Statistics: ( ) ( ) 0.7502 ; 0.0117 μ=Σ= 1.8 Uniform Prior: 0.6886 2.1469 0 0 0.7655 ; 0 5.7522 0 0.8327 0 0 2.8427 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.3248; 1.5220; 0.5823] PDF = 4.06E-3 PDF = 4.060E-3 x = [8.3248; 2.5220; 1.5823] PDF = 0.0000 PDF = 0.0000 x = [0.3248; 0.5220; 0.0823] PDF = 4.06E-3 PDF = 4.060E-3 x = [1.3248; 0.5220; 6.5823] PDF = 0.0000 PDF = 0.0000 Random Sample Statistics: 0.6881 2.1493 1.84 3 4.26 5 0.7592 ; 1.84 3 5.7532 3.08 3 0.8345 4.26 5 3.08 3 2.8426 EE EE EE μ −− − − ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ=− − − ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ −− − ⎝⎠ ⎝ ⎠ 1.9 Uniform Prior: 1.1326 1.0703 0 ; 1.1685 0 2.9062 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [0.2279; 0.1720] PDF = 0.0473 PDF = 0.0473 x = [5.2279; 6.1720] PDF = 0.0000 PDF = 0.0000 x = [2.2279; -1.1720] PDF = 0.0473 PDF = 0.0473 x = [3.2279; 2.1720] PDF = 0.0000 PDF = 0.0000 Random Sample Statistics: 1.1337 1.0713 7.85 4 ; 1.1701 7.85 4 2.9084 E E μ −− ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ −− ⎝⎠ ⎝ ⎠ 282 B.2 Validation of Information Criterion Functions D-OPTIMAL CRITERION: (VERIFIED ANALYTICALLY) 2.1. ExpDecay, PowerVAR, θ = 1.0, Delta Prior: ( ) ( ) 0.75 , 0.00 μ=Σ= σ = [0.50; 0.00] x = [1.333] CRIT = 1.0391 CRIT = 1.0391 σ = [0.50; 0.50] x = [2.667] CRIT = 0.2598 CRIT = 0.2598 σ = [0.50; 0.75] x = [5.334] CRIT = 0.0649 CRIT = 0.0649 2.2. RiseFall, PowerVAR, θ = 1.0, Delta Prior: 0.7 0.0 0.0 , 0.2 0.0 0.0 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ σ = [0.50; 0.00] x = [1.229; 6.863] CRIT = 0.0952 CRIT = 0.0952 σ = [0.50; 0.50] x = [0.845; 11.719] CRIT = 0.0107 CRIT = 0.0107 σ = [0.50; 0.75] x = [0.520; 21.868] CRIT = 0.0016 CRIT = 0.0016 2.3. Hill3, PowerVAR, Delta Prior: 1.0 0.0 0.0 0.0 4.0 , 0.0 0.0 0.0 2.0 0.0 0.0 0.0 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ σ = [0.50; 0.00] x = [2.37; 6.74; 564.4] CRIT = 41.753 CRIT = 41.753 σ = [0.50; 0.50] x = [1.23; 4.71; 366.7] CRIT = 4.7212 CRIT = 4.7212 σ = [0.50; 0.75] x = [0.39; 3.20; 348.4] CRIT = 0.6385 CRIT = 0.6385 2.4. Hill4, PowerVAR, Delta Prior: 100.0 0000 1.0 0000 , 1.5 0000 10.0 0000 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ σ = [0.50; 0.00] x = [9.E-6; 0.500; 2.004; 5.E+3] CRIT = 9.93E-9 CRIT = 9.93E-9 σ = [0.50; 0.50] x = [9.E-6; 0.673; 3.058; 264.25] CRIT = 0.0208 CRIT = 0.0208 σ = [0.50; 0.75] x = [9.E-6; 0.804; 3.831; 466.40] CRIT = 24.8730 CRIT = 24.8730 283 EID-OPTIMAL CRITERION: (VERIFIED BY MATLAB PROGRAM) 2.1. ExpDecay, PowerVAR, θ = 1.0, Normal Prior: ( ) ( ) 0.75 , 0.0117 μ=Σ= σ = [0.50; 0.00] x = [1.333] CRIT = 1.0864 CRIT = 1.0832 σ = [0.50; 0.50] x = [2.665] CRIT = 0.2706 CRIT = 0.2708 2.2. RiseFall, PowerVAR, θ = 1.0, σ = [0.50; 1.00] Normal Prior: 0.7 0.0608 0.0050 , 0.2 0.0050 0.0133 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.237; 6.865] CRIT = 0.0039 CRIT = 0.0039 2.3. RiseFall, PowerVAR, θ = 1.0, σ = [0.25; 0.50] Uniform Prior 0.7 0.0608 0.0050 , 0.2 0.0050 0.0133 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.237; 6.865; 6.875; 1.230] CRIT = 3.2331E-4 CRIT = 3.2333E-4 2.4. Hill3, PowerVAR, σ = [0.50; 0.50] LN: 1.0 0.2208 0.0200 0.0010 4.0 , 0.0200 0.6333 0.0060 2.0 0.0010 0.0060 0.2833 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [9.0E-6; 0.5; 1.0; 2.004; 10; 5.0E+3] CRIT = 4.6360 CRIT = 4.6369 ED-OPTIMAL CRITERION: (VERIFIED BY MATLAB PROGRAM) 2.1. ExpDecay, PowerVAR, θ = 1.0, Uniform Prior: ( ) ( ) 0.75 , 0.0117 μ=Σ= σ = [0.50; 0.50] x = [2.665; 2.664] CRIT = -8.0076 CRIT = -8.0228 σ = [0.50; 0.75] x = [5.331; 5.336] CRIT = -32.1145 CRIT = -32.0936 2.2. RiseFall, PowerVAR, θ = 1.0, σ = [0.50; 0.50] Uniform Prior 0.7 0.0408 0.0000 , 0.2 0.0000 0.0033 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [2.237; 6.865] CRIT = -47.4881 CRIT = -47.4799 284 2.3. RiseFall, PowerVAR, θ = 1.0, σ = [0.50; 0.50] Normal Prior: 0.7 0.0408 0.0020 , 0.2 0.0020 0.0033 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.237; 6.865; 6.875; 1.230] CRIT = -308.699 CRIT = -308.613 2.4. Hill3, PowerVAR, σ = [0.50; 0.50] LN: 1.0 0.2208 0.0200 0.0010 4.0 , 0.0200 0.6333 0.0060 2.0 0.0010 0.0060 0.2833 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.0E-6; 0.5; 1.0; 2.004; 10.0; 5.0E+3] CRIT = -0.4681 CRIT = -0.4682 BD-OPTIMAL CRITERION: (VERIFIED BY MATLAB PROGRAM) 2.1. ExpDecay, PowerVAR, θ = 1.0, Lognormal Prior: ( ) ( ) 0.75 , 0.0469 μ=Σ= σ = [0.50; 0.50] x = [2.665] CRIT = -0.2039 CRIT = -0.2038 σ = [0.50; 0.75] x = [5.331] CRIT = -0.4016 CRIT = -0.4016 2.2. RiseFall, PowerVAR, θ = 1.0, σ = [0.50; 0.50] Normal Prior: 0.7 0.0408 0.0100 , 0.2 0.0100 0.0033 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.237; 1.230] CRIT = -2.3588 CRIT = -2.3587 2.3. RiseFall, PowerVAR, θ = 1.0, σ = [0.50; 0.50] Uniform Prior 0.7 0.0408 0.0000 , 0.2 0.0000 0.0033 μ ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [1.237; 6.865; 6.875; 1.230] CRIT = -1.8065 CRIT = -1.8085 2.4. Hill3, PowerVAR, σ = [0.50; 0.75] LN: 1.0 0.2208 0.0200 0.0010 4.0 , 0.0200 0.6333 0.0060 2.0 0.0010 0.0060 0.2833 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ x = [9.0E-6; 0.5; 1.0; 2.004; 10.0; 5.0+3] CRIT = 1.5012 CRIT = 1.5008 285 B.3 Global optimization algorithm (verified by literature and plots) 3.1 Rosenbrock Banana: N=5, M=2, 0 local, global = [1, 1; …] = 0 3.2 4-d Powell: N=8, M=4, 0 local, global = [0, 0, 0, 0; …] = 0 3.3 Goldstein-Price: N=5, M=2, 4 local, global = [0, -1; …] = 3 3.4 Three-hump Camel Back: N=10, M=2, 2 local, global = [0, 0; …] = 0 3.5 Hosaki: N=5, M=2, 1 local, global = [4, 2; …] = -2.345 3.6 Multipeak: N=2, M=2, 9 local, global = [3.92, 3.98; …] = -2.145 286 B.4 Optimal Designs Using Cases from Literature 4.1 ExpDecay, PowerVAR, θ = 1.0 (validated analytically) D-OPTIMAL DESIGN: ( ) () ( ) () () () () () () () () () 1 2 01 2 0 1 22 2 0 11 22 * 1 ; , exp exp 2 exp exp 2 1 ;, 2 1 exp 2 1 1 0@ 1 ,0 1 D D D Ix x x x x x x x Ix x xx x optimum xx αθ θ α σ σα θ α σ ασ θ αθ σ ασ α σ θ ασ − − ⎡⎤ =− − × ×− − ⎡⎤ ⎡⎤ ⎣⎦ ⎣⎦ ⎣⎦ =− ∂ ⎧⎫ =− −− ⎨⎬ ∂ ⎩⎭ = =≠ − EID-OPTIMAL DESIGN: () () () ( ) ( ) () ()() () () ()( ) () () () () {} () () () () () () () () 1 2 01 2 0 1 22 1 2 0 11 32 1 1 1 2 1 0 32 1 1 ; , exp exp 2 exp 1 exp 2 1 1 exp 2 1 exp 2 1 21 1.5 1 exp 2 1 1 ;, 1.5 1 1 EID p B A EID Ix x x x x x xd BA x Bx Ax xBA Ax Ax x Ix x xBA Bx x α αθ θ α σ σα θ α σ ασ α θσ σ σσ θσ σ σ σ αθ σ θ σ σ − − ⎡⎤ =− − × ×− − ⎡⎤ ⎡⎤ ⎣⎦ ⎣⎦ ⎣⎦ =− − − =−−− −− ⎛⎞ −− − ⎜⎟ ⎜⎟ − ∂ ⎝⎠ = ∂ − ⎛ +− − − ∫ E () () () () ()( ) 1 1 * 1 1 exp 2 1 0@ 1.5 1 :log 2 1 1.5 1 EID Bx optimum Ax xsolves xB A Bx σ σ σ σ ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ⎨ ⎬ ⎞ ⎪ ⎪ − ⎜⎟ ⎪ ⎪ ⎜⎟ ⎝⎠ ⎩⎭ = ⎛⎞ −− =− − ⎜⎟ ⎜⎟ −− ⎝⎠ 287 D-OPT, σ = [0.50, 0.00], α = 0.75 x* = [1.3333] x* = [1.3334] EID-OPT, σ = [0.50, 0.00], U(0.75, 0.0117) x* = [1.2828] x* = [1.2828] D-OPT, σ = [0.50, 0.50], α = 0.75 x* = [2.6667] x* = [2.6668] EID-OPT, σ = [0.50, 0.50], U(0.75, 0.0117) x* = [2.5655] x* = [2.5656] D-OPT, σ = [0.50, 0.75], α = 0.75 x* = [5.3333] x* = [5.3334] EID-OPT, σ = [0.50, 0.75], U(0.75, 0.0117) x* = [5.1311] x* = [5.1311] 4.2 ExpDecay2, ConstVAR (Pronzato and Walter, 1985) ED-OPT, U(1,10) x* = [0; 0.501] x* = [8.7996E-12; 0.5004] D-OPT, U(1,10) x* = [0; 0.182] x* = [1.1346E-11; 0.1818] ED-OPT N(5.5, 1.5) x* = [0; 0.222] x* = [4.1021E-11; 0.2046] D-OPT, N(5.5, 1.5) x* = [0; 0.182] x* = [5.0026E-10; 0.1818] 4.3 Rise Fall, ConstVAR (Box and Lucas, 1959) D-OPT x* = [1.23; 6.86] x* = [1.2294; 6.8581] EID-OPT (cov = 1.E-6) x* = [1.23; 6.86] x* = [1.2295; 6.8578] 4.4 Hill3, PowerVAR (Bezeau and Endreyni, 1986) D-OPT, σ = [0.50, 0.00] x* = [2.3732; 6.7433; ∞] x* = [2.3738; 6.7408; 2.302E+4] EID-OPT, σ = [0.50, 0.00] x* = [2.3732; 6.7433; ∞] x* = [2.3738; 6.7403; 1.026E+5] D-OPT, σ = [0.50, 0.50] x* = [1.2238; 4.7057; ∞] x* = [1.2251; 4.7094; 5.275E+3] EID-OPT, σ = [0.50, 0.00] x* = [2.3732; 6.7433; ∞] x* = [1.2252; 4.7092; 1.175E+7] 288 B.5 Markov Chain Generation, Statistics, & Diagnostics (verified in MATLAB) 5.1 ExpDecay, QuadrVAR: Normal Prior: ( ) ( ) 0.75 ; 0.123 μ=Σ= σ = [0.05; 0.10], θ = [10.0], x = [0.5; 0.6; 0.7; 0.8; 0.9]; α = [0.5] z = [7.6; 7.4; 7.3; 6.7; 6.6] Median = [0.5033] Median = [0.5033] Mean = [0.5021] Mean = [0.5021] Mode = [0.5056] Mode = [0.5034] 15, 45, 85 th Percentiles = 0.4363 0.4959 0.5672 ⎡⎤ ⎣⎦ 15, 45, 85 th Percentiles = 0.4363 0.4959 0.5672 ⎡⎤ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9450 0.9056 ... 0.1180 0.0784 0.0394 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9450 0.9056 ... 0.1180 0.0784 0.0394 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0396 0.0119 0.0014 0.0044 ... 0.0004 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0396 0.0119 0.0014 0.0044 ... 0.0004 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ z = [7.5; 7.4; 6.8; 6.9; 6.3] Median = [0.5310] Median = [0.5310] Mean = [0.5293] Mean = [0.5293] Mode = [0.5336] Mode = [0.5419] 15, 45, 85 th Percentiles = 0.4606 0.5219 0.5966 ⎡⎤ ⎣⎦ 15, 45, 85 th Percentiles = 0.4606 0.5219 0.5966 ⎡⎤ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9453 0.9071 ... 0.1168 0.0776 0.0396 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9453 0.9071 ... 0.1168 0.0776 0.0396 ⎡⎤ ⎢⎥ ⎣⎦ 289 MPSRF = 400 800 1200 1600 ... 19600 20000 0.0119 0.0124 0.0039 0.0037 ... 0.0003 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0119 0.0124 0.0039 0.0037 ... 0.0003 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ 5.2 ExpDecay, QuadrVAR: Lognormal Prior: ( ) ( ) 1.50 ; 0.823 μ=Σ= σ = [0.05; 0.10], θ = [10.0], x = [0.5; 1.0; 1.5; 2.0; 2.5; 3.0]; α = [1.0] z = [0.59; 0.36; 0.20; 0.13; 0.07; 0.04] Median = [1.0810] Median = [1.0810] Mean = [1.0870] Mean = [1.0870] Mode = [1.0689] Mode = [1.0623] 15, 45, 85 th Percentiles = 0.9663 1.0661 1.2097 ⎡⎤ ⎣⎦ 15, 45, 85 th Percentiles = 0.9663 1.0661 1.2097 ⎡⎤ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9488 0.9093 ... 0.1185 0.0789 0.0393 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9488 0.9093 ... 0.1185 0.0789 0.0393 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.1186 0.0241 0.0234 0.0059 ... 0.0010 0.0011 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.1186 0.0241 0.0234 0.0059 ... 0.0010 0.0011 ⎡⎤ ⎢⎥ ⎣⎦ z = [0.63; 0.34; 0.26; 0.10; 0.06; 0.02] Median = [1.0670] Median = [1.0670] Mean = [1.0724] Mean = [1.0724] Mode = [1.0581] Mode = [1.0578] 15, 45, 85 th Percentiles = 0.9575 1.0536 1.1884 ⎡⎤ ⎣⎦ 15, 45, 85 th Percentiles = 0.9575 1.0536 1.1884 ⎡⎤ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9503 0.9106 ... 0.1178 0.0784 0.0387 ⎡⎤ ⎢⎥ ⎣⎦ 290 Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9503 0.9106 ... 0.1178 0.0784 0.0387 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.1487 0.0237 0.0350 0.0190 ... 0.0016 0.0013 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.1487 0.0237 0.0350 0.0190 ... 0.0016 0.0013 ⎡⎤ ⎢⎥ ⎣⎦ 5.3 RiseFall, QuadrVAR: Normal Prior: 0.7 0.134 0.003 ; 0.2 0.003 0.110 μ − ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ − ⎝⎠ ⎝ ⎠ σ = [0.05; 0.10], θ = [1.0], x = [0.5; 1.0; 3.0; 5.5; 7.1]; α = [1.0; 0.4] z = [0.39; 0.44; 0.35; 0.22; 0.16] Median = [0.8781; 0.3856] Median = [0.8781; 0.3856] Mean = [0.8962; 0.3888] Mean = [0.8962; 0.3888] Mode = [0.8513; 0.3768] Mode = [0.8559; 0.3710] 15, 45, 85 th Percentiles = 0.7144 0.8551 1.0859 0.3244 0.3771 0.4517 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ 15, 45, 85 th Percentiles = 0.7144 0.8551 1.0859 0.3244 0.3771 0.4517 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9237 0.8861 ... 0.1137 0.0763 0.0381 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9237 0.8861 ... 0.1137 0.0763 0.0381 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0450 0.0192 0.0030 0.0056 ... 0.0012 0.0013 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0450 0.0192 0.0030 0.0056 ... 0.0012 0.0013 ⎡⎤ ⎢⎥ ⎣⎦ z = [0.35; 0.50; 0.42; 0.18; 0.10] Median = [0.9302; 0.4094] Median = [0.9302; 0.4094] Mean = [0.9493; 0.4143] Mean = [0.9493; 0.4143] Mode = [0.9012; 0.4061] Mode = [0.9379; 0.3843] 291 15, 45, 85 th Percentiles = 0.7692 0.9063 1.1389 0.3509 0.4013 0.4789 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ 15, 45, 85 th Percentiles = 0.7692 0.9063 1.1389 0.3509 0.4013 0.4789 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9280 0.8899 ... 0.1149 0.0754 0.0382 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9280 0.8899 ... 0.1149 0.0754 0.0382 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0287 0.0202 0.0018 0.0043 ... 0.0008 0.0006 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0287 0.0202 0.0018 0.0043 ... 0.0008 0.0006 ⎡⎤ ⎢⎥ ⎣⎦ 5.4 RiseFall, QuadrVAR: Lognormal Prior: 0.9 0.114 0.013 ; 0.4 0.013 0.140 μ − ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ − ⎝⎠ ⎝ ⎠ σ = [0.05; 0.10], θ = [1.0], x = [0.1; 0.5; 2.5; 4.2; 6.1; 9.1]; α = [0.5; 0.2] z = [0.17; 0.29; 0.60; 0.39; 0.31; 0.37] Median = [0.8156; 0.1858] Median = [0.8156; 0.1858] Mean = [0.8440; 0.1868] Mean = [0.8440; 0.1868] Mode = [0.7689; 0.1855] Mode = [0.7804; 0.1916] 15, 45, 85 th Percentiles = 0.6247 0.7906 1.0698 0.1571 0.1823 0.2163 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ 15, 45, 85 th Percentiles = 0.6247 0.7906 1.0698 0.1571 0.1823 0.2163 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.8977 0.8604 ... 0.1118 0.0750 0.0371 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.8977 0.8604 ... 0.1118 0.0750 0.0371 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0100 0.0119 0.0015 0.0037 ... 0.0003 0.0003 ⎡⎤ ⎢⎥ ⎣⎦ 292 MPSRF = 400 800 1200 1600 ... 19600 20000 0.0100 0.0119 0.0015 0.0037 ... 0.0003 0.0003 ⎡⎤ ⎢⎥ ⎣⎦ z = [0.10; 0.17; 0.48; 0.57; 0.45; 0.20] Median = [0.5817; 0.2054] Median = [0.5817; 0.2054] Mean = [0.5984; 0.2069] Mean = [0.5984; 0.2069] Mode = [0.5503; 0.2072] Mode = [0.5686; 0.2024] 15, 45, 85 th Percentiles = 0.4623 0.5665 0.7328 0.1753 0.2017 0.2393 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ 15, 45, 85 th Percentiles = 0.4624 0.5665 0.7238 0.1753 0.2017 0.2393 ⎡⎤ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9136 0.8728 ... 0.1177 0.0789 0.0386 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9136 0.8728 ... 0.1177 0.0789 0.0386 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0464 0.0071 0.0271 0.0352 ... 0.0010 0.0010 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0464 0.0071 0.0271 0.0352 ... 0.0010 0.0010 ⎡⎤ ⎢⎥ ⎣⎦ 5.5 Hill, QuadrVAR: Lognormal Prior: 3.0 0.563 0.133 0.016 5.0 ; 0.133 1.563 0.025 2.0 0.016 0.025 0.250 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= − ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ − ⎝⎠ ⎝ ⎠ σ = [0.05; 0.10], θ = [1.0], x = [0.0; 1.0; 3.0; 5.0; 20.0]; α = [2.5; 2.5; 2.5] z = [0.96; 1.11; 1.93; 2.20; 2.47] Median = [2.8092; 3.8973; 1.8212] Median = [2.8092; 3.8973; 1.8212] Mean = [2.8409; 3.9807; 1.8694] Mean = [2.8409; 3.9807; 1.8994] Mode = [2.7478; 3.7078; 1.8077] Mode = [2.7431; 3.7001; 1.8216] 15, 45, 85 th Percentiles = 2.5373 2.7750 3.1509 3.1591 3.7974 4.7961 1.4337 4.7728 2.3052 ⎡⎤ ⎢⎥ ⎢⎥ ⎢⎥ ⎣⎦ 293 15, 45, 85 th Percentiles = 2.5373 2.7750 3.1508 3.1592 3.7974 4.7961 1.4337 1.7728 2.3051 ⎡⎤ ⎢⎥ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9248 0.8864 ... 0.1155 0.0780 0.0394 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9248 0.8864 ... 0.1155 0.0780 0.0394 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0141 0.0094 0.0065 0.0035 ... 0.0003 0.0003 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0141 0.0094 0.0065 0.0035 ... 0.0003 0.0003 ⎡⎤ ⎢⎥ ⎣⎦ z = [0.95; 1.07; 1.82; 2.23; 2.51] Median = [2.8192; 4.0242; 1.8837] Median = [2.8192; 4.0242; 1.8837] Mean = [2.8441; 4.1146; 1.9421] Mean = [2.8441; 4.1146; 1.9421] Mode = [2.7545; 3.8579; 1.8699] Mode = [2.6789; 3.7009; 1.8983] 15, 45, 85 th Percentiles = 2.5395 2.7801 3.1543 3.2891 3.9215 4.9469 1.5001 1.8321 2.4000 ⎡⎤ ⎢⎥ ⎢⎥ ⎢⎥ ⎣⎦ 15, 45, 85 th Percentiles = 2.5395 2.7801 3.1543 3.2891 3.9215 4.9469 1.5001 1.8321 2.4000 ⎡⎤ ⎢⎥ ⎢⎥ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9270 0.8887 ... 0.1156 0.0763 0.0384 ⎡⎤ ⎢⎥ ⎣⎦ Auto-correlation = 0 2000 4000 ... 44000 46000 48000 1.000 0.9270 0.8887 ... 0.1156 0.0763 0.0384 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0045 0.0103 0.0060 0.0035 ... 0.0004 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ MPSRF = 400 800 1200 1600 ... 19600 20000 0.0045 0.0103 0.0060 0.0035 ... 0.0004 0.0004 ⎡⎤ ⎢⎥ ⎣⎦ 294 B.6 Validation of Parameter Estimation Algorithms (verified in MATLAB) 6.1 ExpDecay, σ = [0.05; 0.10], θ = [10.0], Normal Prior: ( ) ( ) 0.75 ; 0.123 μ=Σ= x = [0.5; 0.6; 0.7; 0.8; 0.9] z = [7.6; 7.4; 7.3; 6.7; 6.6] TRUE = [0.5] MLE = [0.5008] MLE = [0.5008] MAP = [0.5231] MAP = [0.5191] z = [7.6; 7.4; 7.3; 6.7; 6.6] TRUE = [0.5] MLE = [0.4831] MLE = [0.4831] MAP = [0.5056] MAP = [0.4994] z = [7.5; 7.3; 6.8; 6.9; 6.3] TRUE = [0.5] MLE = [0.5174] MLE = [0.5154] MAP = [0.5367] MAP = [0.5402] 6.2 ExpDecay, σ = [0.05; 0.10], θ = [1.0], Uniform Prior: ( ) ( ) 1.5 ; 0.823 μ=Σ= x = [0.5; 1.0; 1.5; 2.0; 2.5; 3.0] z = [0.61; 0.37; 0.22; 0.14; 0.08; 0.05] TRUE = [1.0] MLE = [0.9966] MLE = [0.9966] MAP = [1.0237] MAP = [1.0233] z = [0.59; 0.36; 0.20; 0.13; 0.07; 0.04] TRUE = [1.0] MLE = [1.0463] MLE = [1.0463] MAP = [1.0742] MAP = [1.0825] z = [0.63; 0.34; 0.26; 0.10; 0.06; 0.02] TRUE = [1.0] MLE = [1.0222] MLE = [1.0222] MAP = [1.0626] MAP = [1.0709] 6.3 RiseFall, σ = [0.05; 0.10], θ = [1.0], Normal Prior: 0.7 0.034 0.013 ; 0.2 0.013 0.005 μ − ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ − ⎝⎠ ⎝ ⎠ x = [0.5; 1.0; 3.0; 5.5; 7.1] z = [0.35; 0.50; 0.42; 0.18; 0.10] TRUE = [1.0; 0.4] MLE = [0.9843; 0.3981] MLE = [0.9843; 0.3981] MAP = [0.5691; 0.2518] MAP = [0.5822; 0.2472] 295 z = [0.34; 0.50; 0.42; 0.18; 0.09] TRUE = [1.0; 0.4] MLE = [0.9711; 0.4023] MLE = [0.9711; 0.4023] MAP = [0.5637; 0.2539] MAP = [0.5725; 0.2494] z = [0.36; 0.50; 0.41; 0.19; 0.12] TRUE = [1.0; 0.4] MLE = [0.9911; 0.3910] MLE = [0.9911; 0.3910] MAP = [0.5756; 0.2493] MAP = [0.5772; 0.2482] 6.4 RiseFall, σ = [0.05; 0.10], θ = [1.0], Lognormal Prior: 0.5 0.014 0.013 ; 0.4 0.013 0.040 μ − ⎛⎞ ⎛ ⎞ =Σ= ⎜⎟ ⎜ ⎟ − ⎝⎠ ⎝ ⎠ x = [0.1; 0.5; 2.5; 4.2; 6.1; 7.3; 9.1] z = [0.05; 0.21; 0.53; 0.52; 0.41; 0.34; 0.25] TRUE = [0.5; 0.2] MLE = [0.5023; 0.2013] MLE = [0.5023; 0.2013] MAP = [0.5238; 0.2133] MAP = [0.5323; 0.2060] z = [0.05; 0.21; 0.52; 0.52; 0.39; 0.35; 0.25] TRUE = [0.5; 0.2] MLE = [0.5056; 0.2020] MLE = [0.5056; 0.2020] MAP = [0.5199; 0.2150] MAP = [0.5332; 0.2104] z = [0.04; 0.22; 0.57; 0.51; 0.39; 0.30; 0.24] TRUE = [0.5; 0.2] MLE = [0.5675; 0.2098] MLE = [0.5675; 0.2098] MAP = [0.5376; 0.2219] MAP = [0.5384; 0.2289] 6.5 Hill, σ = [0.05; 0.10], θ = [1.0], Lognormal Prior 3.0 0.563 0.133 0.016 5.0 ; 0.133 1.563 0.025 2.0 0.016 0.025 0.250 μ ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ =Σ= − ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ −− ⎝⎠ ⎝ ⎠ x = [1.E-6; 1.0; 3.0; 5.0; 20.0] z = [1.00; 1.14; 1.92; 2.27; 2.49] TRUE = [2.5; 2.5; 2.5] MLE = [2.4972; 2.4905; 2.4862] MLE = [2.4972; 2.4905; 2.4862] MAP = [2.7842; 3.6838; 1.8073] MAP = [2.8210; 3.7195; 1.8138] z = [0.96; 1.11; 1.93; 2.20; 2.47] TRUE = [2.5; 2.5; 2.5] MLE = [2.4592; 2.4835; 2.4921] MLE = [2.4592; 2.4835; 2.4921] MAP = [2.7478; 3.7078; 1.8077] MAP = [2.8040; 3.8381; 1.9092] z = [0.95; 1.07; 1.82; 2.23; 2.51] TRUE = [2.5; 2.5; 2.5] MLE = [2.5125; 2.8460; 2.7035] MLE = [2.5125; 2.8460; 2.7035] MAP = [2.7545; 3.8579; 1.8699] MAP = [2.7488; 3.8355; 1.8646] 296 B.7 Validation of Integration of the Region of Interest 7.1. First Trial: 2-d ROI – Ellipse and Rectangle -5 -2 0 -1 -3 1 2 3 -4 -3 -2 1 0 -1 r 1 r 2 12 12 4 rect ellipse Arr Arr π = = 4 5 20.0 4 1.0 1.5 6.0 1.0 1.5 4.7124 Rlarge Rsmall ellipse A A A π = ⋅= =⋅ ⋅ = =⋅ ⋅ = 0.3000 0.2356 Rsmall Rlarge ellipse Rlarge A A A A = = 0.3005 0.2359 rect Rlarge ellipse Rlarge ROI ROI = = ∫ ∫ 7.2 Second Trial: 3-d ROI – Ellipsoid and Rectangular Prism 12 3 12 3 8 4 3 rect ellipse Vrrr Vrrr π = = 0.5 ~ 0.5 1.0 ~ 3.0 2.0 ~ 5.0 ; Rlarge ⎛⎞ − ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ = 0.25 0.25 1.50 ; 0.50 3.00 1.00 center radius ROI ROI ⎛⎞ ⎛ ⎞ ⎜⎟ ⎜ ⎟ == ⎜⎟ ⎜ ⎟ ⎜⎟ ⎜ ⎟ ⎝⎠ ⎝ ⎠ 12 3 6.0 8 0.25 0.50 1.0 1.0 4 0.25 0.50 1.0 0.5236 3 Rlarge Rsmall ellipse V V V π =⋅ ⋅ = =⋅ ⋅ ⋅ = =⋅ ⋅ ⋅ = 0.1667 0.0873 Rsmall Rlarge ellipse Rlarge V V V V = = 0.1670 0.0871 rect Rlarge ellipse Rlarge ROI ROI = = ∫ ∫ 297 Appendix C: Program Code for this Work This appendix contains the code of the main program used to perform the computations found in this work. This provides a working demonstration of the XRedDesign API and shows how a researcher can easily design and evaluate experiment designs using this software. The actual XRedDesign API code is not provided here; one should refer to the documentation for the XRedDesign API in Appendix A for descriptions of the various classes and member variables and functions. The program compiles using either Microsoft Visual Studio 2005 or the Intel C++ compiler (version 9.0) and was run on Pentium-class (4, M, D, and Core 2) personal computers. main.cpp /* // Main Program File for Huber Ph.D. Work // // copyright 2007 // David J. Huber <huber@usc.edu> // // This program runs all scenarios presented in this dissertation and // contains an interactive mode (default) and a batch script mode, which // works by placing identifiers on the command line that correspond to the // operation to perform, model ID, covariance and error variance level, // sample size, and design matrix. // // Example: ExeName –s 3 1 2 4 “1.000; 2.000; 3.000; 4.000” // will run SSD for model 3 (CSigmoid) using low covariance and high error // variance for a sample size of 4 using the string as the design matrix. */ #include "info.h" #include "prior.h" #include "experiment.h" #include "models.h" void createExperiment(CExperiment *, int, const char *[]); void computeDesign(CExperiment &, const string &); void computeSSD(CExperiment &, const string &); void computeMPSRF(CExperiment &, string &); void computeCredible(CExperiment &, const string &); void computeAnalysis(CExperiment &, const string &); int main(int argc, const char *argv[]) { using namespace std; char pcFBuffer[255], cProgram; 298 cout << "*** DESIGN AND REDUCTION OF BAYESIAN EXPERIMENTS ***\n"; if(argc >= 5) { cProgram = argv[1][1]; sprintf_s(pcFBuffer, 256, "m%cc%cv%c", *argv[2],*argv[3],*argv[4]); } else { cout << "\n* Select a program option:\n\n"; cout << "1.\tCompute the optimal design for an experiment\n"; cout << "2.\tSimulate and analyze an experimental design\n"; cout << "3.\tCompute posterior diagnostics for the experiment\n"; cout << "4.\tDetermine the sample size score for a design\n"; cout << "5.\tCompute the credibility interval for a design\n"; cout << "Choose an option: "; cin >> cProgram; cout << "\nInput the filename to save the results (*.txt): "; cin >> pcFBuffer; } CExperiment e; createExperiment(&e, argc, argv); cout << "\n" << e << endl; string szOutputFile = addFileExtn(string(pcFBuffer), "txt"); e.setFile(szOutputFile); CVector vZ, vStart; switch(cProgram) { case '1': case 'd': case 'D': computeDesign(e, szOutputFile); break; case '2': case 'a': case 'A': computeAnalysis(e, szOutputFile); break; case '3': case 'm': case 'M': computeMPSRF(e, szOutputFile); break; case '4': case 's': case 'S': computeSSD(e, szOutputFile); break; case '5': case 'c': case 'C': computeCredible(e, szOutputFile); break; default: ERROR("Unrecognized Input for Program Option"); } cout << "Complete. Results written to file: " << szOutputFile << endl; return (argc == 1) ? exitConsole(EXIT_SUCCESS) : EXIT_SUCCESS; } void createExperiment(CExperiment *e, int argc, const char *argv[]) // Collects user-inputted data and fills the CExperiment object 299 { using namespace std; uint nSize; char cCho1, cCho2, cCho3; if(argc >= 5) { // Use the parameters inputted on the command line cCho1 = *argv[2]; cCho2 = *argv[3]; cCho3 = *argv[4]; nSize = strtoul(argv[5], NULL, 10); } else { // Manually collect the experimental parameters interactively cout << "\n* Select the experimental model:\n\n"; cout << "1.\tExponential Decay (1st order)\n"; cout << "2.\tExponential Rise and Fall (2nd order)\n"; cout << "3.\tHill Sigmoid\n"; cout << "4.\tSallen-Key Low Pass Filter\n"; cout << "5.\tFlourescence of Indocyanine Green\n"; cout << "6.\tAnaerobic Threshold\n"; cout << "Choose an option: "; cin >> cCho1; cout << "\nSelect the degree of parameter covariance:\n\n"; cout << "1. Low\n2. High\nChoose an option: "; cin >> cCho2; cout << "\nSelect the variance parameter set:\n\n"; cout << "1. Silent <g(t) = sq(0.02 + 0.05*y(t))>\n"; cout << "2. Noisy <g(t) = sq(0.05 + 0.10*y(t))>\n"; cout << "Choose an option: "; cin >> cCho3; cout << "\nEnter the number of observations for this design: "; cin >> nSize; } CVector vMean, vTradMin, vTradMax, vMaxInput; CMatrix mCov; switch(cCho1) { case '1': e->setModels(new CExpDecay, new CQuadrVAR); e->setTheta(1, "10.0"); vMean.reinit(1, "0.50"); if(cCho2 == '1') { // small prior distribution mCov.reinit(1,1, "0.123"); vTradMin.reinit(1,"0.00"); vTradMax.reinit(1,"20.0"); } else { // large prior distribution mCov.reinit(1,1, "0.490"); vTradMin.reinit(1,"0.00"); vTradMax.reinit(1,"50.0"); } vMaxInput.reinit(1, "1.e+6"); break; case '2': e->setModels(new CRiseFall, new CQuadrVAR); e->setTheta(1, "0.06"); vMean.reinit(2, "0.7; 0.2"); 300 if(cCho2 == '1') { // small prior distribution mCov.reinit(2,2, "0.044, -0.013; -0.013, 0.010"); vTradMin.reinit(1,"0.00"); vTradMax.reinit(1,"50.0"); } else { // large prior distribution mCov.reinit(2,2, "0.176, -0.046; -0.046, 0.032"); vTradMin.reinit(1,"0.00"); vTradMax.reinit(1,"100.0"); } vMaxInput.reinit(1, "1.e+6"); break; case '3': e->setModels(new CSigmoid, new CQuadrVAR); e->setTheta(1, "0.5"); vMean.reinit(3, "10.0; 1.5; 1.0"); if(cCho2 == '2') { // large prior distribution mCov.reinit(3,3, "20.25 0.133 0.016; \ 0.133 0.203 -0.025; \ 0.016 -0.025 0.063"); vTradMin.reinit(1, "1.e-2"); vTradMax.reinit(1, "1.e+4"); vMaxInput.reinit(1, "1.e+7"); } else { // small prior distribution mCov.reinit(3,3, "5.300, 0.017, 0.004; \ 0.017, 0.051, -0.001; \ 0.004, -0.001, 0.017"); vTradMin.reinit(1, "1.e-3"); vTradMax.reinit(1, "1.e+5"); vMaxInput.reinit(1, "1.e+12"); } break; case '4': e->setModels(new CLowPass, new CQuadrVAR); e->setTheta(2, "10.0; 10.0"); vMean.reinit(2, "20.0; 45.0"); mCov.reinit(2,2, "40.96 0; 0 107.5"); vTradMin.reinit(1,"1.e-4"); vTradMax.reinit(1,"1.e+3"); vMaxInput.reinit(1, "1.e+4"); break; case '5': e->setModels(new CFlourescence, new CQuadrVAR); vMean.reinit(3, "1382; 100.5; 15.10"); mCov.reinit(3,3, "4.416E+5 0 0; 0 307.9 0; 0 0 9.060"); vTradMin.reinit(1, "1.e-4"); vTradMax.reinit(1, "1.e+0"); vMaxInput.reinit(1, "1.e+2"); break; case '6': e->setModels(new CAnaerobic, new CConstVAR); vMean.reinit(4,"0.0412; 20.15; 0.1746; 5.686"); mCov.reinit(4,4, "0.001, 0.00 0.000, 0.00; \ 0.000, 10.98, 0.000, 0.00; \ 0.000, 0.00, 0.002, 0.00; \ 0.000, 0.00, 0.000, 66.80"); vTradMin.reinit(1, "0.00"); vTradMax.reinit(1, "300"); 301 vMaxInput.reinit(1, "300"); break; default: ERROR("Unrecognized Model Name."); } // Now initialize the prior using the above mean and cov matrix if(cCho1 == '6') e->setPrior(new CResamplePrior, vMean, mCov); else e->setPrior(new CLognormalPrior, vMean, mCov); e->setSize(nSize); e->setVerbose(true); e->setMaxInput(vMaxInput); if(cCho1 == '6') e->setSigma(1, "0.90"); else if(cCho3 == '1') e->setSigma(2, "0.02; 0.05"); else e->setSigma(2, "0.05; 0.10"); if(argc > 6) e->setX(nSize, e->getInputs(), argv[6]); else e->designTrad(vTradMin, vTradMax); } void computeDesign(CExperiment &e, const string &szOutputFile) // Compute an EID-optimal design for a given number of observations { int nMaxEvals, nSize = e.getSize(), nInputs = e.getInputs(); CMatrix mX0(nSize, nInputs), mCov = e.getPrior()->getCov(); optsRCr options; // Adjust the optimization parameters based on the experiment to design const char *pcModelName = e.getExpModel()->name().c_str(); if(!strcmp(pcModelName, "Exponential Decay")) { nMaxEvals = 1000; options.fTolX = 3.e-3; } else if(!strcmp(pcModelName, "Exponential Rise and Fall")) { nMaxEvals = sq(100); // 10000 options.fTolX = 3.e-3; } else if(!strcmp(pcModelName, "Hill Sigmoid")) { nMaxEvals = pow(25,3); // 25^3 = 15625 options.fSmax = 2; options.fTolX = 3.e-4; } else if(!strcmp(pcModelName, "Sallen-Key LowPass Filter")) { nMaxEvals = pow(25,3); // 25^3 = 15625 options.fSmax = 2; options.fTolX = 3.e-4; } else if(!strcmp(pcModelName, "ICG Flourescence in Blood")) { nMaxEvals = pow(25,3); // 25^3 = 15625 options.fSmax = 2; options.fTolX = 3.e-4; } else if(!strcmp(pcModelName, "Anaerobic Threshold")) { ; // No optimal design computed for this model } else ERROR("Unrecognized Model Name"); // Now, compute the optimal design and display when finished 302 cout << "Computing optimal design for " << pcModelName << " model\n"; double fCrit = e.designOpt(new CInfoEID, nMaxEvals, options); cout <<"\nOptimal design for "<< nSize <<" observations:\n"< < e.getX(); cout << "\nOptimal design criterion: " << fCrit << std::endl; } void computeMPSRF(CExperiment &e, string &szOutputFile) // Compute the Markov chain diagnostics for the current experiment { const uint nChains = 5000; optsMC options; options.nLength = 100000; // Compute the posterior diagnostic for the given experiment arrayU pnLength; arrayD pdRMS, pdMean, pdStDev; e.mpsrf(nChains, options, pnLength, pdRMS, pdMean, pdStDev); } void computeSSD(CExperiment &e, const string &szOutputFile) // Compute the various SSD scores for the currently inputted design { uint nChains = 5000; optsMC options; options.nBurnIn = 5000; options.nLength = 100000; CVector vScore; vScore = e.designSSD(new CSizeCombined, new CROIEllipse, nChains, options, CVector(4, "0.80; 0.85; 0.90; 0.95")); } void computeCredible(CExperiment &e, const string &szOutputFile) // Evaluate a design by estimating and computing 90% credibility intervals { const uint T = 20; e.setVerbose(false); optsMC options; options.nBurnIn = 5000; options.nLength = 50000; CMatrix mCases, mCov = e.getPrior()->getCov(); const char *pcModelName = e.getExpModel()->name().c_str(); if(!strcmp(pcModelName, "Exponential Decay")) { if(mCov[0][0] == 0.490) mCases.reinit(4,1, "0.2175; 0.4903; 0.7705; 1.4505"); else mCases.reinit(4,1, "0.1025; 0.2906; 0.8236; 2.3343"); } else if(!strcmp(pcModelName, "Exponential Rise and Fall")) { if(mCov[0][0] == 0.176) { mCases.reinit(4,2, "0.5907, 0.1101; \ 0.6443, 0.1945; \ 0.5474, 0.3003; \ 0.7381, 0.1701"); } else { mCases.reinit(4,2, "1.0842, 0.0703; \ 0.5694, 0.1280; \ 0.7068, 0.0846; \ 303 0.8919, 0.1373"); } } else if(!strcmp(pcModelName, "Hill Sigmoid")) { options.nLength = 100000; if(mCov[0][0] == 20.25) { mCases.reinit(4,3, "7.7641, 1.2770, 0.8720; \ 9.7451, 1.4833, 0.9918; \ 12.2315, 1.7228, 1.1291; \ 15.3523, 2.0011, 1.2829"); } else { mCases.reinit(4,3, "5.9356, 1.0708, 0.7575; \ 9.1192, 1.4366, 0.9699; \ 14.0105, 1.9274, 1.2419; \ 21.5252, 2.5859, 1.5901"); } } else ERROR("Unrecognized Model Name"); CVector vZ; CMarkovChain mcPosterior; CVector *pvMode = new CVector [T]; CMatrix *pmCI = new CMatrix [T]; char pcVol[] = "cred"; string szFile = addFileVol(szOutputFile, string(pcVol)); ofstream fout(szFile.c_str(), ios::out | ios::app); fout << "*** N = " << e.getSize() << endl; for(int j=0; j < 4; ++j) { for(int i=0; i < T; ++i) { // Update the progress indicator if(i == 0) cout << "*** Running Trial "<< i+1 << " of " << T << flush; else cout << "." << flush; // Simulate the experiment from the designated parameter vector CVector vAlpha = mCases.row2vector(j); e.simulate(vZ, vAlpha, false); // Compute the MAP estimate and the 90% credibility interval pvMode[i] = e.estimateMAP(vZ,options, mcPosterior).getAlpha(); pmCI[i] = mcPosterior.cred(0.90); } // Finally, write the results to the file fout << "Case " << j+1 << ":\tAlpha = ["; for(int i=0; i < mCases.getCols(); ++i) { fout << mCases[j][i]; if(i != mCases.getCols()-1) fout << ";\t"; else fout << "]\n" << std::endl; } for(int p=0; p < e.getP(); ++p) { fout << "Parameter " << p; fout << "\nTrial\tMode\tCI_lower\tCI_upper\tCI_width" << endl; for(int k=0; k < T; ++k) { float fCIlower = pmCI[k][p][0], fCIupper = pmCI[k][p][1]; fout << k << "\t" << pvMode[k][p] << "\t" << fCIlower << "\t"; 304 fout << fCIupper << "\t" << (fCIupper-fCIlower) << std::endl; } fout << "******\n" << std::endl; } } fout.close(); delete [] pvMode; delete [] pmCI; } void computeAnalysis(CExperiment &e, const string &szOutputFile) // Analyze a design by simulating experiments and evaluating the results { const uint T = 5000; optsMC options; options.nBurnIn = 1000; options.nLength = 50000; optsMLE opts; // Write the individual error evaluations to a file char pcVol[8]; sprintf_s(pcVol, 8, "N=%d", e.getSize()); string szFile = addFileVol(szOutputFile, string(pcVol)); ofstream fout(szFile.c_str(), ios::out | ios::trunc); fout << "EID\tInfo\tERR_mle\tERR_map\t1/Precision\n"; CParam param; CMarkovChain mcPosterior; CVector vZ, vAlpha, vMAP, vMLE, vPRC_map(T),vERR_mle(T),vERR_map(T); e.setVerbose(false); float fEID = -log(e.info(new CInfoEID, 15625)); for(uint i=0; i < T; ++i) { // Update the progress indicator if(i%100 == 0) cout << "\nRunning Trial " << i+1 << " of " << T << flush; else cout << "." << flush; // Simulate a data set from a random realization of the prior e.simulate(vZ, vAlpha); // MLE estimate of the parameter values opts.vStart = combine(vAlpha, e.getSigma()); vMLE = e.estimateMLE(vZ, opts).getAlpha(); vERR_mle[i] = (fabs(vMLE - vAlpha)/vAlpha).norm(); // MAP estimate of the parameter values and precision vMAP = e.estimateMAP(vZ, options, mcPosterior).getAlpha(); vERR_map[i] = (fabs(vMAP - vAlpha)/vAlpha).norm(); vPRC_map[i] = mcPosterior.precision(vMAP).det(); fout << fEID << "\t" << vERR_mle[i] << "\t"; fout << vERR_map[i] << "\t" << 1.0/vPRC_map[i] << std::endl; } fout.close(); }
Abstract (if available)
Abstract
The costs of sampling are often quite high in biomedical engineering and medicine, where collecting data is frequently invasive, destructive, or time consuming. This results in experiments that are either sparse or very expensive. Optimal design strategies can help a researcher to make the most of a given number of experimental observations, but neglect the actual problem of sample size determination. For a grey-box experiment with continuous parameter and observation spaces, one must determine how many observations are required in order to ensure precise parameter estimates that resist experimental error and prior uncertainty in the parameter values. This work proposes a novel approach to sample size determination that bridges experimental science with principles of quality engineering and control. A population of parallel Markov chains is simulated from the preposterior distribution to generate posterior predictive distributions for a proposed experiment. This represents a collection of possible posterior distributions for the experiment over the entire observation space. One can compute the estimator precision and determine the optimal sample size as a measure of the probability that the experiment, on the average, will fail to yield a necessary degree of estimator precision. This work evaluates the proposed method by applying it to a combination of simulated and practical experiments that validate the utility of the algorithm and examine its properties under various prior distributions and degrees of experimental error. A specialized software package was created to carry out the computations necessary for precision-based sample size determination.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Noise benefits in Markov chain Monte Carlo computation
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
An approach to experimentally based modeling and simulation of human motion
PDF
Active state tracking in heterogeneous sensor networks
PDF
Nonparametric estimation of an unknown probability distribution using maximum likelihood and Bayesian approaches
PDF
Closing the reality gap via simulation-based inference and control
PDF
Adaptive sampling with a robotic sensor network
PDF
Radio localization techniques using ranked sequences
PDF
AI-driven experimental design for learning of process parameter models for robotic processing applications
PDF
Experimental study of radiometric forces with comparison to computational results
PDF
Studies into computational intelligence approaches for the identification of complex nonlinear systems
PDF
Adaptive event-driven simulation strategies for accurate and high performance retinal simulation
PDF
Hashcode representations of natural language for relation extraction
PDF
Bayesian hierarchical models in genetic association studies
PDF
I. Asynchronous optimization over weakly coupled renewal systems
PDF
Scalable sampling and reconstruction for graph signals
PDF
Information geometry of annealing paths for inference and estimation
PDF
Data-driven robotic sampling for marine ecosystem monitoring
PDF
Experimental studies of high pressure combustion using spherically expanding flames
PDF
Human visual perception of centers of optic flows
Asset Metadata
Creator
Huber, David J.
(author)
Core Title
Precision-based sample size reduction for Bayesian experimentation using Markov chain simulation
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Biomedical Engineering
Publication Date
10/25/2007
Defense Date
06/19/2007
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bayes,experiment design,OAI-PMH Harvest,sample size determination,system modeling
Language
English
Advisor
Yamashiro, Stanley (
committee chair
), Maarek, Jean-Michel (
committee member
), Schumitzky, Alan (
committee member
)
Creator Email
huber@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m887
Unique identifier
UC1420958
Identifier
etd-Huber-20071025 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-582267 (legacy record id),usctheses-m887 (legacy record id)
Legacy Identifier
etd-Huber-20071025.pdf
Dmrecord
582267
Document Type
Dissertation
Rights
Huber, David J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
bayes
experiment design
sample size determination
system modeling