Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Stochastic and multiscale models for urban and natural ecology
(USC Thesis Other)
Stochastic and multiscale models for urban and natural ecology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
STOCHASTIC AND MULTISCALE MODELS FOR URBAN AND NATURAL ECOLOGY by Maud Comboul A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulllment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (CIVIL AND ENVIRONMENTAL ENGINEERING) August 2012 Copyright 2012 Maud Comboul Table of Contents List of Figures iv Abstract vii Chapter 1: Motivation and Background 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Mathematical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Monte Carlo Techniques . . . . . . . . . . . . . . . . . . . . . . . . 12 Chapter 2: Analysis of the Value of Information in the Design of Resilient Water Distribution Sensor Networks 17 1 Single Stage Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.2 Mathematical Representation . . . . . . . . . . . . . . . . . . . . . . 21 1.3 Optimization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4 Solution Method: the Greedy Algorithm . . . . . . . . . . . . . . . . 27 1.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 30 1.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2 Two-Stage Stochastic Optimization . . . . . . . . . . . . . . . . . . . . . . 38 2.1 Bayesian Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.2 Second Stage Optimization . . . . . . . . . . . . . . . . . . . . . . . 40 2.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 EPANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1 Hydraulic and Quality Modeling Capabilities . . . . . . . . . . . . . . 46 3.2 Simulation Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Chapter 3: Evolutionary and Ecological Impacts of Disturbance Regimes on Vegetation Structures 52 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2 Model Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 ii 2.1 Physiology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.2 Shading and Growth . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.3 Disturbances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.4 Environmental Stochasticity . . . . . . . . . . . . . . . . . . . . . . 62 2.5 Model Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 62 3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.1 Forest Ecology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2 Demographic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Evolutionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 4: Multiscale Forest Dynamics 79 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2 Forest Model as a Point Process . . . . . . . . . . . . . . . . . . . . . . . . 84 2.1 Point Process Representation . . . . . . . . . . . . . . . . . . . . . . 84 2.2 Individual-Based Dynamics . . . . . . . . . . . . . . . . . . . . . . . 85 3 Coarse-Grained Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 3.1 Neighborhood Description . . . . . . . . . . . . . . . . . . . . . . . 87 3.2 Conditional Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . 88 4 Computational Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Chapter 5: Conclusion and Outlook 98 Bibliography 101 iii List of Figures 2.1 Two dierent spreads of the same pollution scenario due to variations in the water demands; the pollution source is indicated by the black star. The snapshot was taken at 10pm on day one in both cases. . . . . . . . . . . . 25 2.2 Ex7: Impact reduction function with respect to the number of sensors. Top gure: S 2 assumption. Bottom gure: S 3 assumption. . . . . . . . . . . . 35 2.3 Ex6: Assumption S 3 total impact for sample scenarios with sensor located at A 1 and A 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.4 Ex7: AssumptionS 3 , sensor placementsA 1 (top),A 2 (middle) andA 3 (bot- tom) and associated average nodal impacts, which are scaled up for visual- ization (the nodal impact cannot exceed the nodal demand) . . . . . . . . 37 2.5 net1: updated probability distribution function of the set of potential con- tamination scenarios I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.6 net1: Comparison between the impact generated with N rst stage sensor network and N rst stage + M second stage sensor network . . . . . . . . 44 2.7 net1: Comparison between the impact generated with N+M rst stage sensor network and N rst stage + M second stage sensor network . . . . . . . . 44 2.8 ex7: First stage and resulting second stage sensor networks after contamina- tion event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.1 Interaction between trees. A three-tree conguration gives rise to ve dier- ent leaf area index values L j . . . . . . . . . . . . . . . . . . . . . . . . . 60 iv 3.2 Tree height distribution (shades of green) evololving in time (horizontal axis) during the canopy development. Heights range from 0 (bottom) to 25 meters at the top of the vertical axis . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.3 Self-Thinning: Leaf mass average (horizontal axis) vs. forest density (vertical axis) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 In uence of dierent disturbance regimes on leaf area index . . . . . . . . . 66 3.5 In uence of dierent disturbance regimes on average height . . . . . . . . . 66 3.6 In uence of dierent disturbance regimes on net primary production . . . . 67 3.7 In uence of dierent disturbance regimes on biomass density . . . . . . . . 67 3.8 Starting with smooth Gaussian distribution for the LMA (dark green), the population evolves (in shades of yellow) into 2 species and eventually only 1species is left under frequent disturbance regime . . . . . . . . . . . . . . 68 3.9 Starting with a wide Gaussian distribution for the maturation height (dark green), same pattern occurs as for the LMA . . . . . . . . . . . . . . . . . 69 3.10 In uence of dierent disturbance regimes on leaf area index . . . . . . . . . 70 3.11 In uence of dierent disturbance regimes on average height . . . . . . . . . 71 3.12 In uence of dierent disturbance regimes on net primary production . . . . 71 3.13 In uence of dierent disturbance regimes on biomass density . . . . . . . . 72 3.14 Starting with smooth Gaussian distribution for the LMA (dark green), the dis- tribution spread slightly but seems unaected by changes in the disturbance regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.15 Starting with smooth Gaussian distribution for the maturation height (dark green), the distribution spreads slightly and shifts to smaller maturation heights as time increases and as disturbance regimes become more frequent 74 3.16 Starting with smooth Gaussian distribution for the wood density (dark green), the spread of the distribution with time is less pronounced for more frequent disturbances. Again, in this case we can observe a slight shift to less dense wood species as disturbances become more frequent . . . . . . . . . . . . . 75 v 3.17 Starting with smooth Gaussian distribution for the seed size (dark green), the spread of the distribution with time is less pronounced for more frequent disturbances. Here, we can observe a strong shift to forests producing smaller seed sizes as disturbances become more frequent . . . . . . . . . . . . . . . 76 4.1 Coarse-grained Interaction between trees: light values at dierent depths are homogenized across the patch . . . . . . . . . . . . . . . . . . . . . . . . 89 4.2 Eective canopy openness trajectories for a sample of trees: values are ob- tained from the ne scale simulations. Time units are years . . . . . . . . . 94 4.3 Probability distribution functions of canopy openness conditioned on dierent Leaf Area Index values of the forest canopy . . . . . . . . . . . . . . . . . 94 4.4 Forest size distribution at steady state for the ne scale model and the coarse scale model, top and bottom gures respectively . . . . . . . . . . . . . . . 95 4.5 Time evolution of particular forest aggregate properties for the ne scale model and the coarse scale model, top and bottom gures respectively . . . 96 4.6 Forest conguration for the ne scale model and the coarse scale model, top and bottom gures respectively . . . . . . . . . . . . . . . . . . . . . . . . 97 vi Abstract This research re ects on both particular cases of ecologies: sensor networks in urban water distribution systems and forest dynamics under changing disturbance regime, and the elabo- ration of exploration tools to investigate and assess those complex ecologies. The predictive models that emerged from this work rely on sophisticated computer simulations designed based on stochastic and multiscale principles, and contribute to characterizing and evaluating the knowledge and information that is available about the systems under investigation. This thesis is organized into three distinct projects, each of which confronts specic challenges encountered when modeling complex systems. The rst project considers the monitoring of water distribution networks where we describe a stochastic parameterization and analysis of uncertainty for the design of single-stage as well as two-stage sensor networks aimed at max- imizing the probability of detection of accidents and intrusions in water distribution systems. Next, this study explores the ecological and evolutionary impacts of dierent disturbance regimes (generated following a stochastic Poisson process) on forests using the framework of a spatially explicit and individual-based forest model designed around four functional traits of trees. The nal project investigates a multiscale characterization of forest dynamics using Monte-Carlo simulations of the ne scale dynamics to synthesize a coarse-grain stochastic model describing the dynamics of the system on larger spatial scales. In addition to model- ing aspects, all three projects yield adequate computational performance, thereby assessing a recurrent challenge associated with the computational feasibility and performance of relevant numerical algorithms. vii Chapter 1 Motivation and Background 1 Introduction Environmental policies must be decided in the face of complexity. However even through the eye of an expert, complexity obstructs ones ability to foresee the evolution of a system. From complex systems emerge new properties, unobservable at the level of the individual constituents, and preventing from studying simpler sub-systems experimentally. Whether con- cerning urban infrastructures or environmental systems, much of the complexity lies internally, and is manifested in diculties in apprehending the interacting and feedback processes, dis- parity of scales, spatial heterogeneity and nonlinear dynamics. And much of it lies in the local to global relationship: policies altering local ecologies cause global ripples and global change aect ecologies locally. Here, ecology is used broadly speaking, as the study of the interactions between living organisms and their environment and among themselves. Complex systems are prevalent in the empirical world where available representations are limited to noisy measure- ments of reality. Sequences of observation can be noisy, erroneous because of measurement error and/or poorly represent the system because of scale disparities (the sample is too small in space or too short in time compared to the scale of the phenomenon under scrutiny). The 1 latter point is the main constraint on capturing processes that while monitored today have been happening over very long time scales, such as evolutionary scales in biological systems. Understanding those systems yet often relies on data. Thus increasingly, scientist and engi- neers employ a variety of modeling and signal processing techniques in order to manage these complex systems and analyze the existing data. In this realm, my research addresses the design and development of predictive models to analyze, value and project the knowledge and information available about a system. More specically, this thesis re ects on both some particular cases of ecologies: urban water distri- bution networks and forest dynamics, and the elaboration of exploration tools to investigate and assess those complex ecologies. Mathematical modeling help us grasp the essence of a system through the identication and the simplication of the dierent processes involved. Computer simulations additionally enable dynamics to take place, thereby letting us observe behaviors and evolutions. On the one hand, models and simulations dont have agency, but on the other hand models combined with policy can have major impacts. However, this combination, which has been used in the past, should be exercised with caution; in Europe, the failure of the engineered forests of the late eighteenth century exemplies the risks of applying theoretical models in nature (Scott [1998]). There are several aspects in designing a model that must be considered in order to make it practical to policy makers: Actions usually occur at the local level. For that reason, global models can be irrelevant as they do not give agency locally. By locality, I include spatial as well as temporal scales. On the contrary, too much detail in a model may lead to loss in the value of 2 information. Therefore, a model must communicate information at a pertinent scale, one that is engaging to policy makers. The ability to evaluate the condence in model outcomes is crucial. However, large uncertainties place severe limitations on agency. Thus, the best way to empower decision makers is to increase condence in the model outcome. The scale relevance issue is a dicult one; indeed, as an example, let us examine the population problem. Population dynamics are often known and therefore can only be described at the level of the individuals and their interactions. However, quantities of interest to decision- making evolve on distinct, usually much larger spatial and temporal scales than the one of the individual agents. The computational power needed to compute the evolution of those demographic distributions remain in some cases elusive. Through modeling and simulation, it is important to create bridges between scales. Up or down-scaling methods are currently being developed to nd equivalent dynamics from the scale where the science is known to the scale of interest. To quantify the condence level in model outcome, again a stochastic approach is applied in this thesis. From a scientic point of view, stochasticity in models takes care of the unknowns of the system. This approach makes a lot of sense in the realm of complex systems, which are dicult to comprehend. Stochastic models also allow for the quantication of the uncertainties in the predictions obtained from simulation. When the uncertainty is too high, thereby making the model results unusable in the decision-making process, it is in some cases possible through simulation runs to determine where or when observations must be made in order to decrease the uncertainty in the results. 3 Being essential to policy-makers, I have chosen to tackle these ideas through three case studies. Each project, which illustrates a particular case of ecological issue present locally, confronts some of the challenges encountered when modeling complex systems. Large scale urban infrastructures can exhibit complex spatial behaviors requiring the con- tinual monitoring of their performance. Chapter two describes a stochastic parameterization and analysis of uncertainty for the design of sensor networks aimed at maximizing the prob- ability of detection of accidents and intrusions in water distribution networks. In addition to the absence of information about possible contamination events, the study takes into account stochastic water demands as well as imperfect sensors. The high computational cost is over- taken through the use of submodular cost functions, which allows us to solve the optimization problem with a greedy algorithm. The results show high sensitivity of the optimal sensor network layout on the uncertainty in the stochastic parameters as well as individual sensor specications and accuracy. The analysis concludes with a two-stage optimization approach where an additional sensor network is positioned in the network as soon as a contamination event is detected by the sensor network in place. The chemical concentration pattern measured by the rst stage sensors allows us to use the Bayesian formalism to update the probability distribution function that is assumed on the set of possible contamination scenarios. The optimization problem formulated with the updated pdf is resolved with the greedy algorithm. We show that the resulting second stage network is drawn around the source of contamination and polluted regions of the water distribution system. Chapter three explores the ecological and evolutionary impacts of dierent disturbance regimes on forests using the framework of a spatially explicit and individual-based forest model designed around four functional traits of trees, namely leaf mass per unit area, maturation 4 height, wood density and seed size. The simulated forests are subjected to dierent distur- bance regimes that are generated following a stochastic Poisson process. The ndings of this study facilitate the assessment of the short-term as well as the long-term impacts of dierent disturbance regimes on forests. Because individual-based models still demand high computational costs and because they don't necessarily focus on the scale that matters to policy-makers, chapter four investigates a multiscale characterization of forest dynamics. At the ne scale, an individual-based point process model is used to describe ecological processes relevant to evolution of individual trees and their mutual interactions. At the coarse scale, a stochastic model is synthesized from ne-scale simulations to describe the dynamics of the forest on larger spatial scales. The coarse model is no longer based on the physics like the individual-based model, but is stochastically driven according to distributions evolving from the interactions at the ne-scale. 2 Mathematical Background The complexity present in the natural and urban systems studied in this thesis is assessed using sophisticated computer simulations and notions of probability theory, stochastic processes and statistical learning. I will recall some of those notions in the next few sections. 2.1 Bayesian Inference Bayesian statistics explicitly recognize and utilize four components of knowledge. Prior infor- mation, which is incorporated by specifying prior probability distributions for the parameters, and data, made from measurements and observations, are combined in a model to generate 5 posterior knowledge. The Bayesian formalism is natural and general, and is applicable to any kind of event. Let us describe the important rules for conditional probabilities. The conditional probability P (EjH) that the event E occurs if hypothesis H occurs is P (EjH) = P (E\H) P (H) ; P (H)6= 0 Thus P (E\H) =P (EjH)P (H); E andH are independent events ifP (E\H) =P (H)P (E) or equivalently ifP (EjH) = P (E), which implies that the knowledge that one event has occurred does not change the probability of the other. On the other hand, if P (EjH)6= P (E), then the two events are correlated. If we think of all the (mutually exclusive) hypotheses H i that could condition the eventE, how can we determine the probability ofH i ifE has occurred? This process is called inference, i.e. having observed an event, we want to assess the probability of each possible cause. The standard form of Bayes" theorem can be stated as P (H i jE) = P (EjH i )P (H i ) P j P (EjH j )P (H j ) Where the denominator is the normalization factor such that P i P (H i jE) = 1, P (H i ) is the prior probability of H i before E has occurred, P (H i jE) is the posterior probability of H i after observations have been made and P (EjH i ) is called the likelihood, i.e. the probability that H i caused E to occur. We can generalize the theorem to work with probability density functions (pdf). Letx2X be some unknown variables and y2Y be the data, we can compute the posterior pdf p(xjy) given the prior p(x) and the likelihood function p(yjx) as p(xjy) = p(yjx)p(x) R X p(yjx 0 )p(x 0 )dx 0 6 However, it is often intractable to compute the normalizing factor at the denominator. The Monte Carlo techniques introduced in section 2.3 are often applied to solve integration problems in large dimensional spaces. 2.2 Stochastic Processes A stochastic processfx t ; t2 Tg is a family of random variables indexed by the parameter set T where t often refers to time. x t can have a discrete or a continuous state space and the parameter set T may also be discrete or continuous. It is worthwhile to note that a stochastic process is in fact a function of the parameter t as well as the probability parameter !. Sofx t ; t2Tg actually stands forfx t (!); t2T; !2 g. For each t, x t (:) is a random variable and for each !, x:(!) is a realization of the process. For a nite setft 1 ;:::;t n g in T , the joint distribution function of x t 1 ;:::;x tn is called a nite dimensional distribution of the process. If one can specify the nite dimensional distribution F (x t 1 ;:::;x tn ), the joint density functionp(x t 1 ;:::;x tn ) or the joint characteristic function xt 1 ;:::;xtn (u 1 ;:::;u n ) for all nite setsft 1 ;:::;t n g in T then the stochastic process is statistically determined. Stochastic processes can have interesting properties described next. Stationarity: a stochastic process is said to be stationary if its probability law is invariant to translations in time. In other words: p(x t 1 ;:::;x tn ) =p(x t 1 + ;:::;x tn+ ) 8ft 1 ;:::;t n g2T;82T (1.1) Ergodicity: a stationary process is ergodic if E(x t 1 )<1, then 1 n P n k=1 x t k !E(x t 1 ) with probability 1. Intuitively, a process is ergodic when its statistics taken over time or space are equivalent. 7 Mixing: let (s) denote the strong mixing coecient and (s) = supfjp(A\B)p(A)p(B)j : 1<t<1; A2fx 1 ;:::;x t g; B2fx t+s ;:::;x 1 gg (1.2) If (s)! 0 decays exponentially as s!1 then the process is strongly -mixing Markov Processes Markov processes are the stochastic analog to ordinary dierential equations in deterministic dynamical systems. That is, the probability law for the future state depends on the present only. A stochastic processfx t ; t2Tg is a Markov process if p(x tn jx t 1 ;:::;x t n1 ) =p(x tn jx t n1 ) 8ft 1 ;:::;t n g2T; t 1 <t 2 <:::<t n (1.3) Now, considering the joint density function p(x t 1 ;:::;x tn ), by denition we can write p(x t 1 ;:::;x tn ) =p(x tn jx t 1 ;:::;x t n1 )p(x t 1 ;:::;x t n1 ) =p(x tn jx t n1 )p(x t 1 ;:::;x t n1 ) (1.4) Going on, we obtain p(x t 1 ;:::;x tn ) =p(x tn jx t n1 )p(x t n1 jx t n2 ):::p(x t 2 jx t 1 )p(x t 1 ) (1.5) The conditional probability densities are called the transition probability of the process and knowing all the transition probabilities p(x t jx ) for all t > 2 T and all p(x t ) species the probability law of the Markov process. For example the random walk and the Brownian motion are a (Gauss)-Markov process. A Markov process which takes discrete values is called a Markov chain. If the chain is also dis- crete in time, the transition probabilities areP (X n+1 =jjX n =i), wherei;j2S a countable 8 set, and if these do not depend on n, then the chain is time-homogeneous and fully specied by its transition matrix T where the elements of the matrix T ij = P (X n+1 = jjX n = i) represent the probability of going from state i to state j. Studying the properties of the transition matrix can provide a lot of information about the process (see notions of recurrent, transient and absorbent states). Let (T n ) ij denote theij th element of T n , then we say that state i communicates with state j if9n > 0 such that (T n ) ij > 0. If j also communicates with i, then they intercommunicate. If all pairs of states intercommunicate, thenT is irreducible. A distribution over the state space is invariant ifT =. Therefore, the invariant distribu- tions of the process are the eigenvectors of the matrixT such thatT =. If the state-space S is nite, then exits and if the invariant distribution is unique, then the Markov chain is ergodic. A Markov chain converges to a target distribution if it is irreducible and aperiodic (should not repeat cyclicly). Point Processes In one dimension, point processes are useful to model the random time sequence representing the occurrence time of a particular event. In higher dimensions, spatial point processes can model random patterns of points where the number of points as well as their locations are random. Finally, space-time point processes can represent random point patterns and their time-evolution. In one dimension, we may look at the arrival times T 1 <T 2 <::: where T i is the occurrence time of the i th event. Or we may analyse the inter-arrival times S i = T i+1 T i , which are independent random variables for some spacial cases (e.g. Poisson processes). 9 The counting process, namely the number of points arriving up to time t, is dened as N t = P 1 i=1 1fT i tg. In higher dimension, one can only count the number of points falling in a particular region so for each bounded, closed set BR d , we have N(B) = number of points falling in B Formally, a point process may be dened as a random measure where the values N(B) 0 and N(B)<1 with probability 1, for all bounded BR d . Assuming the process simple implies that there be at most one point at any point x2R d with probability 1. For an in depth treatment of point processes, see Daley and Vere-Jones [2003], Baddeley [2007], Baddeley et al. [2006], Mller [2003]. Extensive applications in the eld of forest dynamics can also be found in Gavrikov and Stoyan [1995], Mateu et al. [1998], Comas and Mateu [2007], Comas and Mateu [2008], S arkk a [2006], S arkk a and Renshaw [2006], Stoyan and Stoyan [1998], Stoyan and Penttinen [2000], Busing [1991] and Busing and Mailly [2004]) as well as in the third chapter of this thesis. Poisson Processes In one dimension, the Poisson process is a continuous-time counting process N(t); t 0, where N(t) is the number of events which have occurred up to time t. N(0) = 0 and the time between successive events is exponentially distributed with parameter (i.e. P (S i <s) = 1e s ; s> 0) and the increments are independent. Therefore, N(t) has a Poisson distribution with parameter t, i.e. P (N(t) =k) = (t) k k! e t . For inhomogeneous Poisson processes, the rate parameter varies with time so = (t) and 10 is called the intensity function. The Poisson process can also be dened as a continuous-time Markov chain or jump process with transition probabilities dened by T ij (s;t) =P (N(t) =jjN(s) =i) . If T ij (s;s +) = T ij (0;) then the process is time-homogeneous and is represented by a matrixT =e G where G ij = 8 > > > > > > > < > > > > > > > : if j =i if j =i + 1 0 otherwise The Poisson process can be generalized in two-dimensions (as well as inR d ) and is dened the following way Baddeley [2007]: The spatial Poisson process with uniform intensity > 0 is a point process inR 2 such that: for every bounded, closed set BR 2 , the countN(B) has a Poisson distribution with mean Area(B) if B 1 ;:::;B m are disjoint regions, then N(B 1 );:::;N(B m ) are independent The Poisson process represents complete spatial randomness and is commonly used in the modeling of forest res (see an example in chapter three and detailed models in Corral et al. [2007], Henley [1989] and Drossel and Schwabl [1992]). 11 2.3 Monte Carlo Techniques Monte Carlo (MC) methods (Metropolis and Ulam [1949], Robert and Casella [2004], Doucet et al. [2001a] and Mosegaard and Tatantola [1995]) use random processes to estimate de- terministic quantities, especially parameters of probability distributions. In principle, if one can generate many samples of a process of interest, then, by the law of large numbers, the empirical frequencies of the simulations are guaranteed to converge to the true sampling dis- tribution at the rate of 1 p n (central limit theorem), where n is the sample size. A classic example uses Monte Carlo simulations to estimate basic integrals. While traditional numerical methods may be more ecient in computing one-dimensional integrals, MC methods have the advantage of being unaected by high dimensional spaces. Let say we would like to compute some expected value of the type I(f) = Z X f(x)p(x)dx (1.6) wherefp(x); x2Xg is a probability distribution, then we can use Monte Carlo simulations to approximate it by ^ I N (f) = 1 N N X k=1 f( k ) (1.7) Wheref k g k>0 is a sequence of independent and identically distribution random variables with common probability law p(x). Because it can be challenging to sample from a probability distribution p(x), in the next few sections, we will describe some alternative algorithms called Markov chain Monte Carlo methods that are widely used in practice to solve this problem. 12 Rejection Sampling Let p(x) be the probability distribution function of a random variable from which we wish to sample, then, if there is a distribution q(x) from which we can easily sample, satisfying the following condition for all x: p(x)kq(x); q(x)> 0$p(x)> 0; k2R Then, one can use the following proposition (Pardoux [2009]) Proposition 2.1 Let (X n ;U n ) n1 be a sequence of independent random vectors where for each n 1, X n and U n are independent, X n q(x) and U n U(0; 1). Let N = inffn 1; U n p(X n ) kq(X n ) g and X =X N . The random variable X follows p(x) Algorithm 1 Rejection sampling Initialize X 0 Generate a candidate sample from proposal distribution : ^ X t q(x) Generate U t U(0; 1) Compute acceptance probability:A( ^ X t ) = p( ^ X t ) kq( ^ X t ) Accept candidate sample if U t <A( ^ X t ) and t t + 1 The drawback of this method occurs if the bounding constant k is too large, then the acceptance rate 1 k will be small. The Importance sampling technique tries to optimize the choice of the proposal distribution by minimizing the variance of the estimator ^ I N (f) Importance Sampling Let us rewrite equation 1.6 by including the importance weight dened by w(x) = p(x) q(x) : I(f) = Z f(x)w(w)q(x)dx (1.8) 13 Then a possible Monte Carlo estimate of I(f) using q(x) to draw samples from is: ^ I N (f) = 1 N N X k=1 f(X k )w(X k ) (1.9) One can optimize the choice of q(x) by minimizing var q(x) (f(x)w(x)) =E q(x) (f 2 (x)w 2 (x))I 2 (f) Markov Chain Monte Carlo Methods In Equation 1.7, when the state space is very large, one way to sample from distribution p(x) is to use Markov Chain Monte Carlo (MCMC) algorithms. Those methods initially introduced by Metropolis rely on the simulation of an irreducible Markov chain X t ; t> 0, which admits p(x) as its invariant distribution so that the distribution ofX t approachesp(x) for large times t. It turns out that ergodicity and nding transition probabilities that satisfy the detailed balance equation with p(x), are sucient conditions to characterize a Markov chain which accepts p(x) as its stationary distribution. Therefore we need to nd a transition probability T (xjy), which satises: p(X t )T (X t1 jX t ) =p(X t1 )T (X t jX t1 ) (1.10) T (X t jX t1 ) T (X t1 jX t ) = p(X t ) p(X t1 ) (1.11) Once the transition kernelT (xjy) is dened, one can simulate a Markov chainfX t ; t 0g starting with an initial guess X 0 and projecting that guess forward in time via T (X t1 jX t ) such that an estimate of R f(x)p(x)dx based on the ergodic theorem for Markov chains is: 1 N T+N X t=T+1 f(X t ) Where T represents the size of the discarded value set referred to as "burn-in". 14 Given a denition for the acceptance probability denoted by A(x;y), the Metropolis- Hasting works the following way: Algorithm 2 Metropolis-Hasting Algorithm Initialize X 0 Generate a candidate sample from proposal transition distribution : ^ X t T ( ^ X t jX t1 ) Generate U t U(0; 1) If U t <A(X t1 ; ^ X t ) then X t = ^ X t else X t =X t1 The acceptance probability can be chosen in many dierent ways Candy [2009]: Metropolis sampling: setA(x;y) = min(1; p(x) p(y) ) Hasting generalized case: setA(x;y) = min 1; p(x)T (yjx) p(y)T (x;y) Simulated annealing: setA(x;y) = min 1; p(x) 1= T (yjx) p(y) 1= T (x;y) ! and adjust the parameter at each step according to some decreasing rule. Gibbs sampler: assumingx is anndimensional vector and we know the full conditional p(x t jx 1 ;:::;x t1 ;x t+1 ;:::;x n ) and let us denote byx t = (x 1 ;:::;x t1 ;x t+1 ;:::;x n ). By dening the proposal transitions for t = 1;:::;n as follows: T (^ x (i) jx (i1) ) = 8 > > > < > > > : p(^ x (i) t jx i1 t ) if ^ x (i) t =x i1 t 0 otherwise Then, the acceptance probability is always one and the Gibbs algorithm is a special case of the Metropolis-Hasting algorithm described above: Algorithm 3 Gibbs Algorithm Initialize x (0) Sample x (i+1) t p(x t jx (i+1) 1 ;:::x (i+1) t1 ;x (i) t+1 ;:::;x (i) n ), for all t = 1;:::;n 15 Extensive applications of Markov chain Monte Carlo methods can be found in Katsoulakis and Vlachos [2003], Katsoulakis et al. [2003b], Katsoulakis et al. [2003a], Katsoulakis and ans A. Sopasakis [2006] and Katsoulakis and Trashorras [2006] related to stochastic multiscale approaches (see also chapter four). 16 Chapter 2 Analysis of the Value of Information in the Design of Resilient Water Distribution Sensor Networks Abstract The ability to monitor the ow as well as how water transforms throughout water networks would greatly improve the management of those distribution systems. The sensor placement problem attempts to nd the locations of monitoring devices that would optimally observe water quality and protect consumers from accidents and intrusions of contaminants. In some related critical scenarios, the absence of information about possible contamination events in- cluding knowledge of the injection sources, contaminant types or mass and time of pollution as well as the variability of water network input parameters, such as the nodal demands and the pipe roughness coecients, raises several challenges that must be addressed by the sensor network design process. In this chapter, we describe a stochastic parameterization and anal- ysis of uncertainty for the design of sensor networks aimed at maximizing the probability of detection of accidents and intrusions in water distribution networks. A challenge with such an approach, in particular when applied to large urban water networks, is the size of the ensuing 17 computational model and the associated numerical optimization problem. This problem is compounded in the presence of uncertainty, where the need to deal with a large number of statistically plausible scenarios underscores the need for ecient yet credible algorithms for addressing the sensor placement issue. In addition to modeling aspects, we also address a challenge associated with the computational feasibility and performance of relevant numerical algorithms. Specically, and through the use of submodular cost functions, we are able to solve the optimization problem with a greedy algorithm, yielding sucient computational per- formance that allows us to describe stochastic water demands in the context of Monte Carlo simulations. We further account for imperfect sensors while maintaining submodularity. Our results show that the optimal sensor network layout is highly dependent on whether uncertainty in the stochastic parameters has been introduced and whether individual sensors can fail or not. We end the analysis with a two-stage optimization approach where an additional sensor network (or second stage sensors) is added as soon as a contamination event is detected by the sensor network in place (rst stage sensors). The chemical concentration pattern measured by the rst stage sensors allows us to use the Bayesian formalism to update the probability distribution function that is assumed on the set of possible contamination scenarios. The optimization problem formulated with the updated pdf is resolved with the greedy algorithm. Using this technique, we show that we can relax the global alarm assumption, which is im- practical during a real life water network pollution event. Indeed, the resulting second stage sensor placement establishes boundaries around the source of contamination and the polluted regions of the water distribution system. 18 1 Single Stage Analysis 1.1 Introduction Functionality of water distribution systems is ascertained both by water pressure and water quality at all distribution points. These systems exhibit complex spatial behavior re ecting heterogeneity, variability and time-dependence in demand. Verifying their safe and reliable operation would thus benet from monitoring water distribution systems with sucient spatial coverage to ensure identiability and controllability of their dynamics. Sensor networks permit the simultaneous measurements of a distributed system at a number of observation nodes and, when used in conjunction of predictive models of these systems, could serve as an early warning system against accidents and unauthorized intrusion on water distribution systems. The design of these networks, both in terms of their spatial layout and the modality of their measurement, should aim to minimize the impact of such occurrences as the potential inltration of hazardous contaminants and loss of pressure due to leaks or breakage. We will address, in this chapter, the challenge of mitigating the consequences of contaminants introduced into a water distribution system. Water distribution networks are centralized around urban areas, which are major consump- tion sites. Unnoticed, the pollution of a municipal drinking system with harmful chemicals could have a catastrophic impact on public health. Hence networks should be monitored with some level of spatial resolution in order to prevent the inltration of a pollutant from spreading undetected. Due to practical constraints, only a limited number of sensors are usually avail- able, therefore, they should be placed at strategic nodes in the network so as to eciently detect an arbitrary contamination. The optimal sensor placement problem has been the sub- ject of a number of investigations Ostfeld and Salomons [2004], Berry et al. [2005],Carr et al. 19 [2006],Watson et al. [2006],Propato [2006], Shastri and Diwekar [2006],Leskovec et al. [2007], Krause et al. [2008] and Weickgenannt et al. [2010]. For an extensive review on the subject, see Hart and Murray [2010]. Primarily, the diculty comes from the absence of knowledge about potential contamination events, which can take many forms depending on the source of inltration, the substance, and the instance of occurrence, the later being closely correlated with the water demand, and therefore the ow pattern and the pollution spread. The opti- mization should thus be conducted over a very large set of predened contamination scenarios. Consequently, it can be computationally prohibitive to perform the optimization on large net- works with tens of thousands of nodes. The exploration of ecient solution approaches vary in the literature but are mostly heuristic approaches (Berry et al. [2005] and Berry et al. [2006b]), including genetic algorithms Ostfeld and Salomons [2004] and Weickgenannt et al. [2010]. The high computational cost as well as the large set of possible contamination events has led, in some instances, for a focus on worst case scenarios Watson et al. [2006] with atten- dant economic burden. In addition to computational challenges, the denition of meaningful measures of damage Watson et al. [2004], Krause et al. [2008] is also critical for a well-posed description of the problem. Several indices are relevant to characterizing the impact of an intrusion on a water distribution network. These include detection time, aected population and volume of contaminated water injected prior to detection. Furthermore, and while most analysis methodologies consider sensors with perfect responsiveness to contamination, it can be expected that bias and signal to noise characteristics of sensors should play a role in sensor network design Berry et al. [2006a]. Another important issue relates to spatial variability of the ow over the water network. Thus small perturbations in the nodal demand values can extensively modify the ow pattern causing dierent spread of the same contamination sce- nario. Stochastic approaches have been formulated to address this issue Shastri and Diwekar 20 [2006], usually by assuming some network input parameters, such as the nodal demands, to be random. Recent probabilistic characterization of the water distribution network has been directed towards assessing the risk from natural hazards in the context of interacting infras- tructure systems Chang et al. [2002]. Addressing any of the uncertainties relevant to the optimal sensor placement problem will signicantly increase the computational burden of an already NP-hard problem (i.e. combinatorial optimization problem), and algorithmic develop- ments are critical for enabling further progress. A recent formulation of the problem using cost functions with submodular properties enabled ecient greedy algorithms to nd near optimal solutions Krause et al. [2008]. Brie y, submodularity pertains to the observed fact that the marginal benet of adding one sensor decreases as the number of sensors increases. We build on this development to carry out a stochastic extension of the problem. We introduce both stochastic parameters in the hydraulic simulations, re ecting limits on predictability of the system, and uncertainty about the performance of sensors, in the form of a probability of detection. An extensive analysis indicates that the optimal sensor network is highly dependent on both forms of uncertainty. 1.2 Mathematical Representation A water distribution network can be described by a graph G(V;E) where the nodes V = fv 1 ; ;v n g represent the junctions, the sources and the reservoirs while the edges E = fe 1 ; ;e m g correspond to pipes. For given pressures at various pumping stations, the ow of water through the network is controlled at all times by nodal demands, which vary continually with time, causing the ow to change speed and direction. Consequently, the sensor placement problem requires the ability to perform hydraulic simulations to evaluate the ow pattern at any time under condition of known or assumed demand, as well as water quality simulations to 21 track how dierent contamination scenarios spread through the system. We rely on EPANET Rossman [1999], a computer program developed by EPA's Water Supply and Water Resources Division as our main simulation tool. EPANET performs time domain simulations of hydraulic and water quality behavior of pressurized pipe networks. Several properties of the network were controlled during the simulations; for example, the nodal base demands and the time patterns associated with the nodes were perturbed for the stochastic analysis and the contamination source, mass and injection time were changed for each scenario. We dene the set of all "possible" contamination scenarios I, where each scenario i 2 I consists of a source, a contaminant type and mass as well as an injection time and a probability of occurrence P (i). Because the set of all possible contamination scenarios is of innite cardinal (not only could it happen at any node but more importantly at any time, at any concentration level and of any kind of contaminant), this question raises the issues of dening a representative set of scenarios as well as quantifying damages in a water distribution network after such an event. During a water quality simulation, contaminant particles are tracked passing through the network allowing for the construction of time series describing the evolution of the contaminant in terms of its concentration at every node: 8v 2 V and8i 2 I, we introduce the nodal concentration time seriesfc i (v;t);t2 [0;T ]g, whereT is the total simulation time. For each node, we focus on the arrival time of the rst non-zero concentration, which would also be the time of detection if a sensor was located at that node. We thus dene T i (v) to be the time for scenario i to reach node v. Let S be the set of all possible sensor locations, and let jSj denote the cardinality of S. Since in our formulation sensors are placed only at nodes, SV . Aksensor placementA, wherekjSj is a subset ofS. For a given scenarioi, the detection time for a sensor placement A is dened as 22 T i (A) = min s2A T i (s) (2.1) In addition, the impact associated with a particular sensor placementA for a given scenarioi is denoted by i (T i (A)). The impact function i can have many forms; in the past, measures of damage have included detection time, consumed or delivered contaminated water and aected population prior to detection. It will be dened later on. 1.3 Optimization Problem Deterministic System Let i (t) be the measure of damage if scenario i is detected at time t (if the intrusion is not detected, thent =1). is nondecreasing, i.e. i (t) i (t 0 ); 8t<t 0 . One key assumption that allows i to be essentially a function of time, is the idea of global alarm which can be stated as follows: as soon as a contamination is detected, a global alarm is raised, which prevents further consumption of polluted water. Due to some mathematical properties relevant to our solution method, we consider the maximization of the damage reduction function instead of the minimization of the impact function itself. For all of scenario i, the damage reduction function generated by a subset of sensor placement A is expressed the following way, R i (A) = i (1) i (T i (A)) : (2.2) Let us denote by R(A), the expected impact reduction for a sensor placement A over all scenario i2I. Then, R(A) = X i2I P (i)R i (A) ; (2.3) 23 where P (i) is the probability of occurrence for scenario i. We may now formulate our k- sensor placement optimization problem as the setAS that maximizes the expected damage reduction over all intrusion scenarios i2I weighted by their probability of occurrence. A = arg max ^ AS;j ^ Aj=k R( ^ A) (2.4) To solve this optimization problem, we need to perform one hydraulic simulation andjIj water quality simulations. This formulation is equivalent to nding A that minimizes the expected damage: (A) = P i2I i (T i (A)). However the damage reduction function presents some advantages over the damage function itself as it is submodular Krause et al. [2008]. Stochastic System In a more realistic setting, while the average demands may be known, uctuations occur from one day to the next; for example, water consumption may increase during hotter days and the regions of high demand may dier during workdays vs. holidays. Furthermore, rare events, such as res or water leakages, also contribute to network variability. As mentioned in the introduction, the ow in water distribution networks can be rather sensitive to uctuations in nodal demands. Slight perturbations in the demands can cause the watercourse to change direction in several pipes. Because the way pollution spreads through a network directly follows ow patterns, the same scenario can draw very dierent impacts for distinct demand patterns. This point is depicted in gure (2.1) showing simulated results, using EPANET, corresponding to the same pollutant migrating under dierent ow patterns. Thus the ecacy of a particular sensor placement against a specic scenario may be diminished with an altered ow pattern. To account for variability within networks, we performed n Monte Carlo simulations with random demands. Each demand pattern sample consisted of a perturbed nodal base demand 24 Figure 2.1: Two dierent spreads of the same pollution scenario due to variations in the water demands; the pollution source is indicated by the black star. The snapshot was taken at 10pm on day one in both cases. 25 as well as a perturbed nodal time pattern. For each demand realization, we perform a hydraulic simulation and for each contamination scenario we perform a water quality simulation. Finally, we compute the expected damage reduction over all demand patterns and all scenarios. For the j th demand sample, we denote the expected impact reduction for a placement A by R (j) (A). The new optimization formulation is the following: A = arg max ^ AS;j ^ Aj=k 1 n n X j=1 R (j) ( ^ A) = arg max ^ AS;j ^ Aj=k E[R( ^ A)] (2.5) where the expectation is computed over all demand scenarios. Imperfect Sensor Sensibility To this point, the analysis has supposed perfect sensors, whereby each sensor is theoretically capable of accurately detecting any kind of contaminant at any concentration level. However, the possible failure of a sensor or a group of sensors to detect a contamination is likely to happen if the concentration levels are too low and also depending on the chemical type. This problem has been theoretically addressed in a previous paper by Krause et al. [2008]. In this formulation, we impose a weight factor denoted by p i (A) for sensor placement A, which increases with the pollution concentration at the time of detection, speculating that the optimization will favor placements with some sensors located close to the sources of higher impact scenarios. Accordingly, we modied the impact reduction function from equation (2.4) as follows, A = arg max ^ AS;j ^ Aj=k R( ^ A) = arg max ^ AS;j ^ Aj=k X i2I P (i)p i ( ^ A)R i ( ^ A) (2.6) 26 1.4 Solution Method: the Greedy Algorithm An exhaustive search to nd the optimalksensor placement over all possible placement com- binations in water networks is an NP-hard problem and would quickly become computationally prohibitive as the number of nodes increases. Alternative algorithms must be developed, even at the cost of yielding sub-optimal solutions. The submodularity of the damage reduction function yields interesting theoretical properties that are very relevant in this context. Sub- modularity implies that for all subsets ABS and element s2S, the damage reduction function satises the following properties, 1. R(?) = 0 2. 06R(A)6R(B) 3. R(A[fsg)R(A)>R(B[fsg)R(B) . The monotonic increase and submodularity of the reduction function permit the use of greedy algorithms to nd very good near-optimal solution to our optimization problem. The idea behind the greedy algorithm is simple. It sequentially adds the best sensor location to an existing placement A, starting with the empty set. It has specically been shown Nemhauser et al. [1978] that greedy algorithms produces near-optimal solutions when the objective function is non-decreasing and submodular. Near-optimality, in this context, can further be quantied in the sense that the set A G resulting from this method is such that R(A G )> (1 1 e )R(A ), where A is the optimal k-sensor placement and e is the exponen- tial constant. Let T hyd and T qual be the computation cost to run a hydraulic and a water quality simulation respectively using EPANET and n be the number of Monte Carlo simu- lations, the total computational cost is now reduced from O jSj k +jIj T qual +nT hyd to 27 O (kjSj +jIj T qual +nT hyd ). Factorial growth in computational cost has thus been replaced by linear growth with the network size. A pseudocode of the algorithm can be found in Leskovec et al. [2007]. With these greedy algorithms available, the exploration of stochastic variabilities can be carried out in an ecient and comprehensive fashion. Results from that analysis are described next. A pseudocode of the algorithm is shown in Algorithm 4. Algorithm 4 Greedy(S,A,s) A ; for j from 1 to k do s =argmax s2S R(A[fsg)R(A) A A[fs g end for Using the submodular property further, Krause et al. [2008] were able to reduce the amount of necessary evaluations of the reduction cost R(A). Thus, letting A (s) = R(A[fsg) R(A); 8s2SrA, allows one to write, A (s)> A[fsg (s 0 ); 8s 0 2SrA[fsg by denition of submodularity. Therefore by sorting the possible sensor placements s in decreasing order of their associated A (s) for a xed subset A, as a new element is added to set A (i.e. A = A[fs g), it becomes unnecessary to reevaluate A (s) for all s2 SrA[fs g. It is likely that the sensor placement that is at the top of the queue will remain at the top after reevaluation and if not, only a few top elements need to be reevaluated. The new algorithm is depicted in Algorithm 5 below. With these greedy algorithms available, the exploration of stochastic variabilities can be carried out in an ecient and comprehensive fashion. Results from that analysis are described next. 28 Algorithm 5 Improved Greedy(S,A,s) A ; for all s2S do insert (s) R(fsg) into QfQ is a queue sorted in decreasing orderg end for A s such that (s ) =Q top remove element at the top of Q for j from 1 to k do repeat new (s) R(A[fsg)R(A) for s such that (s) =Q top remove element at the top of Q insert new (s) into Q until new =Q top A A[fs g such that (s ) =Q top remove element at the top of Q end for 29 1.5 Results and Discussion The following experiments are designed to compare the results obtained with the three dif- ferent optimization contexts described earlier: known nodal demands, stochastic demands and imperfect sensors. The set I of contamination scenarios remains the same in all the experiments and contains contamination events at each node in the network occurring at ve dierent times of the day. The contaminant mass and injection length are kept constant, thereforejIj = 5jVj. The quantity to minimize is in all cases the aected population density, which is directly proportional to the water demand values d v (t) at the aected nodes. Let be a threshold below which the contaminant concentration is considered harmless EPA [2010] and letc i (v;t) be the chemical concentration at network nodev and timet generated by con- tamination scenario i, then we can compute i (t), namely the impact generated by scenario i2I at time t the following way: i (t) = X t;v2V : c i (v;)> d v () The probability P (i) of scenario i2 I (equations 2.3 and 2.6) is equal to 1 jIj , since the probability distribution over all scenarios is chosen to be uniformly distributed in the absence of information. Finally, the detection probability of a sensor p i (s), for s2S is calculated as follows: let t s (i) be the rst detection time of scenario i by sensor s, then p i (s) = c i (s;t s (i)) c max where c max is a large chemical concentration value. For a sensor placement A, let s 0 = arg min s2A t s (i) then we can write thatp i (A) =p i (s 0 ). In order to test how the variability in the demand patterns in uences the optimal placement, the demand patterns are assumed to 30 be either deterministic or stochastic. In the deterministic case simulations, the demands are known exactly at all time and at every node. In the stochastic case, we introduce randomness by generating the nodal demands and the associated time patterns following dierent proba- bility distributions. We perform Monte Carlo simulations where the nodal demand values are independently distributed either from the uniform distribution (U(d v (1);d v (1 +))) or the normal distribution truncated at zero ( N(d v ;d v )), alternatively taking magnitudes of 0:1, 0:25 and 0:5. For each generated demand pattern and every contamination scenario in I, we run a water quality simulation. During the quality simulation, we measure the impact i (A) that a particular scenarioi2I has on the network if sensors were placed at node subset AV . In the random demand case, we compute E[ i (A)], namely the average impact over a large set of demand patterns. Two sets of optimal sensor placements were found using the greedy algorithm; one solution for the perfect sensor assumption and one for the imperfect sensor assumption. Finally, the robustness of the placements were tested against the sensor sensibilities, i.e. whether the sensors respond perfectly or imperfectly to contaminations. The optimal sensor placements are computed for three sets of assumptions: LetS 1 be the assumption that nodal demands have deterministic values and that sensors are perfect. Let S 2 assume stochastic demand values (only averages are known) as well as perfect sensor sensitivity. LetS 3 be the most realistic case, where sensors respond imperfectly to contaminations and the demand values are stochastic. TofS j ; j = 1; 2; 3g we associate the three optimal sensor location setsfA j ; j = 1; 2; 3g , which are evaluated with the ecient greedy algorithm described previously. In the stochastic 31 demand case and for all water network studied in this chapter, we performed 1000, 2500 and 5000 Monte Carlo simulations. Since the results obtained with 2500 and 5000 simulations consistently agreed with each other, we concluded that convergence was reached for 2500 Monte Carlo runs. The number of MC simulations is denoted by n in equation 2.5 We compare the performance of the three resulting sensor placementsfA j ; j = 1; 2; 3g. To enable comparison between the dierent solutions, the average impacts at each node v2V over all scenario inI, denoted byE( l;k (v)), are measured for allfA k ; k = 1; 2; 3g and under each assumption setfS l ; l = 1; 2; 3g. The experiment was performed and results are presented for 3 synthetic water networks. The rst one "ex5" has 180 nodes, 250 pipes, 1 reservoir and 1 tank. The second one "ex6" has 110 nodes, 120 pipes, 1 reservoir and 1 tank. The third one "ex7" partially shown in gure 2.4 has 380 nodes, 472 pipes, 1 reservoir and 1 tank. The EPANET object library developed by the Environmental Protection Agency Rossman [1999] was used in C++ to perform all the hydraulic and water quality simulations needed for our problem. The computation time on "ex5" was about 1 minute per MC simulation on a single dual-processor node. Figure 2.2 illustrates how the impact reduction increases with the number of sensors in place. All impact reduction functions are monotonically increasing with the increasing number of sensors. In the top graph, the impacts are computed according to assumptionS 2 so placement A 1 is sub-optimal in this case, resulting in higher impact values compared to the optimal A 2 placement. The bottom graph shows impacts measured with S 3 . A 3 , which is the optimal placement assuming S 3 exhibit very low impact values compared to A 1 and A 2 . However, with more than 10 sensors in place, A 1 outperforms A 2 . The next bar plot, (gure 2.3), shows the impact values for a sample of contamination scenarios on a water distribution network. The impact values computed assuming S 3 with A 1 or A 3 in 32 place, show that 5 sensors are simply insucient to protect the water network. Increasing the number of locations fromA 3 to at least 20 sensors, greatly reduces the contamination impact, however the impacts only decrease slightly when increasing the number of sensor locations from A 1 . Figure 2.4 depicts "ex7" from EPANET with optimal 20-sensor placements A 1 (top), A 2 (center) and A 3 (bottom). Each gray circle is proportional in size to the average water demand at that node and the gray lines linking the nodes represent the pipes; each line is also weighted proportionally to the water volume that ows through it. The triangles show the sensor locations. Clearly, dierent strategies lead to very dierent optimal sensor placements. The average nodal impacts, represented by the circles, are all evaluated assuming S 3 . There is a large cluster of large nodal impacts near the source with A 1 . We observe a cluster of large impacts with A 2 near the source as well. The nodal impacts with A 3 in place are noticeably smaller, but also more uniformly distributed over all the nodes, suggesting that even though contamination events of small magnitude may happen, large scale events will be avoided. We noticed, in this example, that A 1 and A 2 agreed half the time, A 2 and A 3 had three common locations, but A 1 and A 3 disagreed on all placements except for one node. In addition, we were able to observe that in theS 2 context, the optimal sensor locations changed as we modied the nature of the nodal demand variability from the truncated normal to the uniform distribution. However the optimal imperfect sensor placement (A 3 ) conducted under the assumption S 3 was robust to changes in the uncertainty description. 1.6 Concluding Remarks Availability of ecient algorithms (greedy algorithms) and adapted software tools (EPANET) enable comprehensive queries and control of complex water distribution systems. We have 33 explored the problem of optimal sensor placement under stochastically-specied information and demonstrated the value of such an approach by quantifying the impact of information acquisition through the addition of sensors. We can draw several conclusions from our simula- tion results. First, the obtained sensor locations signicantly depend on the initial hypothesis regarding the uncertainty in the water demands as well as the sensor accuracy. Secondly, placements that are optimal in a particular context can perform poorly under another set of assumptions. Finally, the optimal imperfect sensor placement was more robust to uctuations in the demand values than the perfect sensor placement. However, the number of sensors re- quired to eectively protect water networks in the imperfect sensor case is signicantly higher than when sensors are assumed to be perfect detectors. Because the robustness of the resulting placements to uncertainty in the input parameters is an issue, the optimization formulation could be improved by adding a constraint on the variance of the impact reduction function. In addition, although we have assumed random nodal demands, it may be more realistic to also introduce randomness into the pipe roughness coecients since these parameters greatly in uence the ow patterns as well. Finally, raising a global alarm as soon as a sensor detects pollution comes at a considerable xed cost of shutting down the whole system. An alternative would be, however, to allow for the containment of the threat by shutting o only the contaminated parts of the network. Hence, a cost associated with the control of the spread could be included in the optimization problem. 34 0 5 10 15 20 25 30 35 40 45 0 50 100 150 200 250 300 Number of Sensors Impact Reduction Impact reduction with hypothesis S2 A1 A2 0 5 10 15 20 25 30 35 40 45 0 50 100 150 200 250 Number of Sensors Impact Reduction Impact reduction with hypothesis S3 A1 A2 A3 Figure 2.2: Ex7: Impact reduction function with respect to the number of sensors. Top gure: S 2 assumption. Bottom gure: S 3 assumption. 35 Figure 2.3: Ex6: Assumption S 3 total impact for sample scenarios with sensor located at A 1 and A 3 36 Figure 2.4: Ex7: Assumption S 3 , sensor placements A 1 (top), A 2 (middle) and A 3 (bottom) and associated average nodal impacts, which are scaled up for visualization (the nodal impact cannot exceed the nodal demand) 37 2 Two-Stage Stochastic Optimization In the previous section, I explained how to design a ksensor network that minimizes the impact of contamination events on the consumers of a water distribution system. The as- sumptions included a uniform distribution over all possible contamination scenarios, random water demands at the nodes of the water network and imperfect sensors. In the following section, instead of placing all k available sensors initially, n sensors are assigned to the near- optimal location nodes (in the sense of impact minimization previously described). Then, in the case where a contamination event is detected by one or more sensor in place,m additional sensors are to be deployed to localize the source of the contamination (where k = n +m). The second stage assumes the ability to rapidly deploy a new sensor network when contami- nation observations are recorded. Measurements of the contamination pattern recorded at the rst stage sensor locations allows to draw inference on the space of possible contamination sources, the second stage thereby involves far fewer uncertainties than the rst stage problem. The source identication problem has been approached assuming a linear dependency of the contamination pattern to the sources of contamination (Tryby et al. [2010]). An inverse prob- lem can also be formulated to estimate the time and location of contamination sources, such as in Laird et al. [2005] where nonlinear programming is used to solve the inverse problem, however those methods often demand high computational costs. Here, we take advantage of the EPANET toolkit capabilities to relax the linear assumption and we compute contamina- tion pattern probabilities based on hydraulic and quality simulations. A Bayesian approach is then employed to update the probability distribution of contamination scenarios based on the concentration readings of the sensors in place. 38 2.1 Bayesian Update To place the rst stage sensors in the water network, we assume that every contamination scenario is equally likely to happen so that we do not suppose any prior knowledge on the potential contamination events. Once pollution occurs, the Bayesian formalism along with the data gathered by the sensors allow us to update the uniform probability distribution function over the space of contamination scenariosI, thereby reducing the set of possible contamination sources. Let set I describe the space of all possible contamination scenarios and A =fx j g j=1;:::;n be the set of sensor placement obtained from the rst stage formulation. In the event that a chemical contaminant is being injected in the water distribution system at source node ^ i2I, the rst stage sensors in place will record time series of chemical concentration measurements denoted byf^ c j (t)g j=1;:::;n . Let ^ C =f^ c j (t d )g j=1;:::;n be the contamination pattern at network nodesA at rst detection instancet d , then we can compute a family of conditional probability distributions on the observation space f i ( ^ Cji), for all i 2 I, which is interpreted as the likelihood that scenario i has generated observed contamination pattern ^ C. Therefore, by lettingL(i) = f i ( ^ Cji) be the likelihood functions in the sense of Bayes, and P I U(I) be the uniform probability distribution over the set of scenarios I, called the prior, we can compute the posterior pdf ^ P I (i) =f(ij ^ C) over scenario space I by applying Bayes' theorem: ^ P I (i) = L(i)P I (i) R I L(j)P I (j)dj 39 2.2 Second Stage Optimization The Bayesian update allows us to narrow down the space of possible contamination source scenarios responsible for a given contamination pattern. Now similarly to the rst stage op- timization, we can run an impact minimization utilizing the bayesian inference on the source parameterization to reduce the search space in which the optimal second stage sensor place- ment lives. Using the same notation as in equation 2.6 in the rst stage optimization, we write the second stage optimization problem as follows: denoted byR(B) the impact reduction function of sensor placement B then R(B) = X i2I ^ P I (i)R i (B) ; (2.7) where ^ P I (i) is the posterior probability of occurrence for scenario i, after contamination observations on the set of rst stage sensors A =fx j g j=1;:::;n have been made following a pollution event ^ i. We may now formulate our second stage m-sensor placement optimization problem as the set BS that maximizes the sum of the damage reduction functions for all intrusion scenarios i2I weighted over their updated probability of occurrence ^ P I (i). B = arg max ^ BS;j ^ Bj=m R( ^ B) = arg max ^ BS;j ^ Bj=m X i2I ^ P I (i)p i ( ^ B)R i ( ^ B) (2.8) The greedy approach described in Algorithm 5 is again used to solve the optimization problem. 40 2.3 Results and Discussion Practically, to compute the likelihood function, we use results from the Monte Carlo sim- ulations that were required to solve the rst stage optimization problem. In particular, we have recorded the expected impact reduction if scenario i is detected at every node v2 V : E[R i (v)], where the expectation is over all generated stochastic demand patterns. We also computed the expected total impact generated by contamination scenario i with no sensor in place: E[ i (1)]. If contamination ^ i occurs and is detected for the rst time at rst stage sensor location x j 2A, then we can reduce the space of potential scenarios by writing ^ I j =fi2 I : E[R i (x j )] > 0g. This is similar to interpreting the likelihood function in the sense of Bayes as: L(i) = f(x j ji) = E[R i (x j )] E[ i (1)] . Figure 2.5 shows that if a contamina- tion event is detected by the rst stage sensors, the bayesian update of the pdf of I results in a large number of scenarios having a null probability of occurrence. This is caused by the contamination pattern observed at the rst stage sensor locations having possibly been generated only by a subset of contamination scenarios given the structure of the water network. The following results were obtained on a synthetic water network composed of 400 nodes and 450 pipes named net1. We show how the impact caused by contamination scenarios can be reduced on average as sensors are added during the second stage. In gure 2.6, one can observe the systematic impact reduction as a sensor network of sizeM (colored surface) is added to the rst stage network of sizeN (gray surface) when pollution is detected.This result is intuitive, however mitigation can still be observed when we compareN +M rst stage sensor networks with N rst stage combined with M second stage network as illustrated in gure 2.7. The mitigation in gure 2.7 is due to the second stage sensor network being almost systematically 41 located closer to the "real" source of contamination. It is not systematic though, as gure 2.7 shows instances where is it better to place all N +M sensors during the rst stage analysis. Peaks in the colored surface are seen exceeding the impact generated with the single-stage network. Yet for N < 100 and M < 100 the two-stage approach almost always results in lower impact values than the single-stage one. The Bayesian update is successful in narrowing down the set of possible sources of pollution given a chemical concentration pattern recorded by the rst stage network.The second stage sensor network are this way drawn closer around those possible sources. Figure 2.8 illustrates the two-stage sensor placement as the second stage sensor network is deployed after a particular contamination event has taken place (shown in orange color). The second stage network is clustered around the source of contaminant injection, thereby delimiting the polluted area of the water distribution system. 2.4 Conclusion The size of the rst stage sensor network is important; the more sensors are initially placed in the water distribution system, the less contamination scenarios will spread undetected. In the rst stage analysis, we have shown how to determine a "good" sensor network size in order to mitigate the expected impact from pollution. However, as a contamination event takes place and chemical concentration patterns are being recorded by the rst stage sensors in place, the second stage analysis, namely the Bayesian update of the probability distribution function over the set of contamination scenarios, is quickly able to localize the source of contamination and delimitate the pollution area in the water distribution system. This approach is very useful during real life situations where it is practically impossible to shut down the whole water network if pollution is being recorded. However, it is one crucial assumption in the rst stage analysis. The second stage optimization relaxes this assumption allowing us to 42 0 20 40 60 80 100 120 140 160 180 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 x 10 −3 Sample updated probability distribution Sample contamination scenarios Probability of scenario Uniform distribution Bayesian updated distribution Figure 2.5: net1: updated probability distribution function of the set of potential contamina- tion scenarios I draw boundaries around the contaminated region and the source of contamination. Once the Monte Carlo simulations have been run for the rst stage, the second stage becomes an ecient source identication method requiring little computational power. 43 Figure 2.6: net1: Comparison between the impact generated with N rst stage sensor network and N rst stage + M second stage sensor network Figure 2.7: net1: Comparison between the impact generated with N+M rst stage sensor network and N rst stage + M second stage sensor network 44 Figure 2.8: ex7: First stage and resulting second stage sensor networks after contamination event 45 3 EPANET EPANET, Rossman [1999], is a computer program that was developed by EPA's Water Supply and Water Resources Division. The program performs hydraulic and quality simulations in pressurized water distribution networks. A water network is an ensemble of pipes, nodes (pipe junctions), pumps, valves, tanks and reservoirs. EPANET computes the ow in each pipe, pressure in each node, water level in each reservoir at discretized time steps throughout the simulation. During water quality simulations, the program also computes chemical concen- trations in dierent regions of the network. EPANET is used for dierent tasks such as for residual chlorine simulations and estimating consumers' exposure to a contaminant. Addi- tionally, it provides a tool to manage water networks in terms of dening strategies to adjust the pump regime and the injection of chlorine and schedule maintenance and replacements of pipes. EPANET graphic interface is available for Windows systems, and the programmer's toolkit is a dynamic link library of functions that can be incorporated in C/C++ programs. 3.1 Hydraulic and Quality Modeling Capabilities EPANET hydraulic characteristics and capabilities include the handling of unlimited network size, dierent storage tank shapes, nodes driven by multiple and independent demand patterns and simple as well as more complex rule-based controls. It uses Darcy-Weisbach, Hazen- Williams and Chezy-Manning equations to compute friction head loss, it includes minor head losses for bends and ttings and is able to model constant or variable speed pumps, dierent valve types and pressure-dependent ow issuing from emitters (sprinkler heads). As for the quality simulation capabilities, those include the modeling of the movement of a non-reactive tracer material through the network over time, the movement and concentration 46 variations of a reactive material over time, the age of water throughout a network, reactions both in the bulk ow and at the pipe wall, and dierent water mixing capabilities in storage tanks. EPANET can also track the percent of ow from a given node reaching all other nodes over time, allow growth or decay reactions to proceed up to a limiting concentration, employ global reaction rate coecients that can be modied on a pipe-by-pipe basis, and allow for time-varying concentration or mass inputs at any location in the network. With those abilities, EPANET can simulate the water mixing from dierent sources, the age of water in the network, residual chlorine decay and the spread throughout the network of pollutant introduced at particular nodes. 3.2 Simulation Algorithms Hydraulics Conservation of mass and head loss equations are solved by a hybrid node-loop approach also called the Gradient Method. Let us denote by H the nodal head, h the headloss, r the resistance coecient, Q the ow rate, n the ow exponent and m the minor loss coecient. Let also N be the number of nodes and NF the number of tanks and reservoirs, then the ow-headloss relation in a pipe between node i and j can be written as: H i H j =h ij =rQ n ij +mQ 2 ij (2.9) For pumps, h ij =! 2 (h 0 r(Q ij =!) n ) where h 0 is the shuto head for the pump, ! is a relative speed setting and r andn are the pump relative coecients. Conservation of mass or ow continuity must also be satised for all junctions i = 1;:::;N: 47 X j Q ij D i = 0 (2.10) where D i is the water demand at node i. Equations 2.9 and 2.10 must be solved for all head H i and ows Q ij Let A be an NN Jacobian matrix computed as A ii = P j p ij and A ij =p ij where p ij = 1=(nrjQ ij j n1 + 2mjQ ij j) is the inverse derivative of the headloss in pipe linking i and j with respect to the ow or for pumps p ij = 1=(n! 2 r(Q ij =!) n1 ). Let H be the vector of unknown nodal heads andF an N 1 vector such that F i = ( X j Q ij D i ) + X j y ij + X f p if H f wherey ij =p ij (rjQ ij j n +mjQ ij j 2 )sgn(Q ij ) for pipes andy ij =p ij ! 2 (h 0 r(Q ij =!) 2 ) for pumps. Then the gradient method arbitrarily initializes ows in pipes and at each iteration, nodal heads are found by solving the following system: AH =F (2.11) after what new ows are computed from: Q ij =Q ij (y ij p ij (H i Hj)) (2.12) which always satises nodal mass conservation. Iterate equations 2.11 and 2.12 until the sum of absolute ow changes relative to the total ow in all links is smaller than some tolerance value. 48 Quality Water quality simulations in EPANET are based on conservation of mass coupled with reaction kinetics. The subject is treated in details in Rossman et al. [1993], Rossman and Boulos [1996], Rossman [1999] and Liou and Kroon [1987]. In pipes, a dissolved chemical will move on average, at the speed of the carrier uid simultaneously growing or decaying at a given rate. The advective transport in pipes is modeled as: @C i @t =u i @C i @x +r(C i ) (2.13) where C i is the concentration in pipe i and depends on distance x and time t. u i is the ow velocity in pipe i and r, the rate of reaction as a function of concentration. At nodes receiving in ow from two or more pipes, mixing is assumed complete and instanta- neous. And the concentration of an out owing chemical is the sum of the in owing concen- trations weighted by the ow. Denote by i the link with ow leaving node k, I k the set of links in owing to k, L j the length of pipe j, Q j the ow in link j, Q k;ext the external source ow entering the network at k and C k;ext the concentration of Q k;ext , then for node k, we have: C ijx=0 = P j2I k Q j C jjx=L j +Q k;ext C k;ext P j2I k Q j +Q k;ext (2.14) WhereC ijx=0 is the chemical concentration at the start of linki andC ijx=L at the end of the link. Under completely mixed conditions the concentration in the tank is a blend of the current contents and that of any entering water. At the same time, the internal concentration could be changing due to reactions. Let V s be the volume in storage at time t, C s be the 49 concentration in the tank, I s the set of pipes in owing into the storage and O s the set of out owing pipes, then the tank content is expressed as: @V s C s @t = X i2Is Q i C ijx=L i X j2Os Q j C s +r(C s ) (2.15) Equations 2.13, 2.14 and 2.15 dene a system of coupled dierential equations with time- varying coecients to be solved for all C i in pipe i and C s in tank s given initial conditions specifyingC ijt=0 andC sjt=0 for all pipes and tanks and boundary conditions specifying external mass injections C k;ext and Q k;ext as well as hydraulics specifying volumes in tanks and ow proles at all time. EPANET uses Lagrangian methods to discretize in time alone and track the concentrations of discrete volume elements as they move through the network. 50 Like many urban infrastructures, natural systems can also exhibit complex behaviors. The next chapter examines the dynamics of forests that are subjected to changing disturbance regimes. 51 Chapter 3 Evolutionary and Ecological Impacts of Disturbance Regimes on Vegetation Structures Abstract This study explores the ecological and evolutionary impacts of dierent disturbance regimes on forests. The framework consists of a spatially explicit and individual-based forest model designed around four functional traits of trees, namely leaf mass per unit area, maturation height, wood density and seed size. We allow those trait to vary among the population, thereby enabling evolutionary processes through the potential transmission of mutated traits. The model includes a detailed representation of tree physiology. Given the amount of light that a tree can locally intercept, the physiological model derives the growth of the living tissues and the reproduction and death rates. The available light is computed over the entire three- dimensional domain according to the shade created under the forest canopy. The simulated forests are then subjected to dierent disturbance regimes that are generated following a stochastic Poisson process. The ndings of this study will facilitate the assessment of the short-term as well as the long-term impacts of dierent disturbance regimes on forests. 52 1 Introduction Understanding vegetation dynamics and worldwide vegetation structures is crucial for pre- dicting how plants respond to climate change and how their response feeds back into the atmosphere. The latter occurs through changes in the carbon cycle and in the albedo of the terrestrial vegetation cover. Acquiring a detailed understanding of vegetation dynamics is challenging, however, due to the many diverse processes that drive the ecology and evo- lution of vegetation structures. At the physiological level, light, moisture, temperature, and nitrogen availability are the key factors to which plants have to adapt. At the level of an individual tree, the neighborhood interactions that arise from competition for light and from soil processes in uence the local conditions that in turn aect tree development. At the level of a stand or forest, natural and anthropogenic disturbances such as re, disease spread, or logging greatly in uence the size distribution and species composition of vegetation structures. In addition to the direct impact on forests, disturbance processes also aect the demographics and life-history strategies of trees, although little is known about those eects. Disturbance regimes vary greatly from woodland to woodland. In tropical forests, for example, the typical disturbance is a single large tree toppling over, leaving a gap in the canopy for other trees to exploit. In contrast, disturbances in temperate forests are usually associated with re and wind that sweep through large areas, destroying trees along their path. Human activities are also increasingly altering the natural disturbance patterns of forests, through logging or the prevention of forest res. A better understanding of the ecological and evolutionary impacts of dierent disturbance regimes may not only help explain patterns in worldwide vegetation, but also elucidate the long-term implications of anthropogenic impacts. Field studies have at- tempted to observe the adaptive response of plants to disturbances, but interpreting such data 53 is dicult when neighborhood interactions and disturbance regimes lead to antagonistic selec- tion pressures. As an alternative to eld studies, theoretical investigations based on dynamical forest models have the potential to yield insights into the impacts of disturbance regimes on phenotypically plastic and evolutionary processes of plants. Numerical analyses of disturbed forest dynamics may furthermore help craft policies that will support nature conservation and sustainable exploitation. While dynamical forest models have the potential to yield important insights into the impacts of dierent disturbance regimes, the complex mechanisms involved in forest dynamics are dicult to reproduce. Multiple temporal and spatial scales, ranging from those of tree-to-tree interactions to those of global environmental change, make forest modeling a challenging task: should the focus be on individual trees, on patches of signicant scale, or on vegetation landscapes? Forest dynamics have been modeled at all three levels; the LANDIS model by Mladeno [2004] was developed during the 1990s in an eort to study forests at a landscape scale. LANDIS runs on a mesoscale grid, with each cell representing a forest stand in terms of age classes of dierent tree species. While providing the neces- sary tools for studying a forests interactions with natural and anthropogenic disturbances, the LANDIS model lacks the mechanistic complexity of individual-based models. Gap models (JABOWA, Botkin et al. [1972]; FORET, Shugart and West [1979]) describe forest growth on a coarse grid, representing patches of trees. Horizontally, patch properties are homogeneous; yet, a patchs vertical structure accounts for each tree the patch contains. Patches change independently of one another, because inter-tree distances are ignored. Later models included interaction between adjacent patches (ZELIG, Urban et al. [1991]), however the absence of horizontal structure within patches rendered the task less eective. The lack of horizontal interactions in gap models motivated the developments of models with rened grid resolu- tion that explicitly account for exact tree locations (SORTIE, Pacala et al. [1993]). In such 54 models, grid points are either empty or occupied by individual trees, and interactions among trees occur over in uence zones dened according to tree morphology. The spatially-implicit Perfect Plasticity Approximation (PPA, Purves et al. [2008]) is based on the assumption that projected tree crown area can completely cover the canopy. Strigul et al. [2008] showed that the simplied dynamics described by the PPA agree well with those of a more detailed spatially explicit individual-based model. In summary, spatially explicit models of the dynamics among individual trees are promising tools for studying forest dynamics, but remain under-explored for evolutionary studies and lack explicit representations of plant functional traits. Despite of the high computational cost associated with individual-based models (Levin et al. [1997]; Busing and Mailly [2004]), those models are likely to oer the best choice for simulating disturbed forests, as they avoid unrealistic assumptions about the horizontal homogenization of disturbances that have to be made in patch-based models. In this study, we develop an individual-based and spatially explicit model of forest dynamics with trees characterized by four salient functional traits: leaf mass per area, height at onset of reproductive investment, wood density and seed size. For dierent disturbance regimes, we investigate the short-term ecological impacts in terms of salient aggregate statistics of forest structure, such as leaf area index and standing biomass and the longer-term evolutionary impacts in terms of changes in the predicted range of trait values. The results are critically compared with earlier ndings obtained by Falster et al. [2011] for a spatially implicit size-structured metapopulation model with disturbances. From the dierences in results, we identify the cases where the individual- based and spatially explicit model adds a necessary layer of information and improves the predictions with respect to the simpler patch model. 55 2 Model Description In this chapter, forests are described as spatio-temporal point processes in continuous d- dimensional space. Stochastic point process models of forests, where the trees form a system of dependent random variables, were rst used in Gavrikov and Stoyan [1995], Mateu et al. [1998], Stoyan and Stoyan [1998], Stoyan and Penttinen [2000], Busing [1991] and Busing and Mailly [2004]. Later on, the use of marked point processes (see S arkk a [2006], S arkk a and Renshaw [2006], Comas and Mateu [2007] and Comas and Mateu [2008]) allowed for increasing degrees of detail and interaction in forest representations. To avoid boundary eects, we use periodic boundary conditions, i.e. the forest domain is the evolving on the d-dimensional torusT := [0; 1] d . The torus is obtained by identifying opposite sides of the hypercube in R d , hence it has the characteristic of being nite yet unbounded, which simplies the problem mathematically while providing a rational resolution of boundary conditions. The state of a forest at a given time is therefore represented by a conguration of (marked) points on the unit torus. A point process on a domain is identied at any given time if one can enumerate all the individuals contained in any regionA of . This number will be denoted byN(A). A forest conguration t is dened by the set of all the trees t = n X i ; i = 1;:::;N(t) o occupying the forest domain at a given timet. Thus,N(t) is the counting process associated with the number of individuals in the forest at xed time t. The trees are individually characterized by their spatial coordinatesx i 2T and are identied by four functional traitsf i 2M that were chosen according to Falster et al. [2011]'s study. Specically, the traits that mark each individual consist of the leaf mass per area (lma), the maturation height (hmat), the wood density () and the seed size (s) (at birth and for ospring), therefore f i = (lma i ; hmat i ; i ; s i ) where each component is a scalar value inR. Seed dispersal, establishment, growth, reproduction, 56 and death dene the life cycle, which is driven by the trees energy budget. Photosynthesis or the amount of carbon dioxide that a tree can assimilate at the leaf level is fully driven by the local competition for light. The accurate representation of the local light availability is thus crucial for the present individual-based model. The assimilated energy is then redistributed via allometric relationships, for the respiration, the turnover and the growth of the living tissues supporting the tree, and reproduction. Deaths occur locally when trees are shaded by taller surrounding trees or regionally when a disturbance hit an area of the forest. 2.1 Physiology The chosen functional traits, namely the leaf mass per unit area, height at maturation, wood density and seed size, are based on a recent study by Falster et al. [2011]. In this paper, the authors show that those four traits are good tree descriptors and can be used to characterize plants. Leaf mass per area is a key leaf trait, which conditions carbon assimilation rates as well as adaptation to the lighting environment. Plants that can only grow in high light conditions, have thick leaves and high leaf mass per unit area, while plants that can grow deeper underneath the forest canopy have low leaf mass per unit area values. The maturation height re ects the tree age and the maximum tree height; a tree cannot produce any osprings until it has reached that size, after what most of its energy is allocated to seed production. Wood density is a property of the heartwood and as the seed size increases, the number of osprings decreases. The model also represents dierent tree tissues such as sapwood, bark, heartwood, ne roots and leaf mass. The leaf economics convert the intercepted light into energy that is allocated according to empirically dened allometric relationships, for the survival (tissue maintenance and turnover), the growth (production new tissues) and the seed production of 57 the tree. Respiration is dened to be proportional to the total mass of the living tissues. Net production, which is partially allocated to growth of the various tissues and partially to reproduction, is obtained by subtracting the expenses for respiration and leaf turnover from a trees total carbon dioxide assimilation. Allocation to reproduction depends on the height at maturation of the tree and determines the rate of ospring production. The physiological model and the allometric relationships used in this simulation study are described in detail in Falster et al. [2011]. 2.2 Shading and Growth In our model, competition between trees is exclusively based on light availability. This process is asymmetric in the sense that taller trees shade smaller trees. We assume that the extent to which one tree shades another one depends on their relative sizes and the horizontal distance between them. The light available to a tree is computed by assuming that the trees have " at tops", i.e., their crown is composed of a single layer of leaves at their current height (in Strigul et al. [2008], the authors argue that it is a realistic assumption). Higher layers shade the layers beneath if they overlap horizontally. The shading intensity is also weighted with opacity coecients according to the vertical separation between tree crowns. In the future, one or more layers of foliage may be added to the representation to better re ect the true vertical structure of tree crowns. Computing the light available to each tree is a necessary step to obtain the carbon dioxide assimilation rate accurately, thus we wanted to avoid the pairwise overlapping in uence regions approximation that is traditionally used in individual-based forest models (Comas and Mateu [2007]). Instead, we approach the problem on a three-dimensional grid and compute for each cell the level of light reaching it. The grid size was determined by making the observation that reducing the cell dimensions further did not rene the obtained 58 assimilation rate. The resulting tree-dimensional grid is called the shading matrix. Let A i (t) be the assimilation rate of treeX i at timet,A i (t) is a function of how open the forest canopy is at the height of the tree denoted by h i . Let E(x;h) represent the canopy openness value at x2T and height h, it can be expressed as a function of the leaf area index (LAI) at that point L(x;h), which depends on the total leaf area above that point. The LAI can be very costly to compute explicitly as the number of trees increases in the forest (see gure 3.1). E(x;h)2 (0; 1] is one if the canopy is fully open at point (x;h). Let a i be the ground area occupied by the canopy of tree X i , ml i be the leaf mass of X i and ! i = ml i =lma i be the total leaf area of individual X i , then we can write: A i (t) = X x2a i g E (E(x;h i )) (3.1) E(x;h) =e cL(x;h) (3.2) L(x;h) = X k:(x2r k )\(h k h) ! k =a k (3.3) where g E (x) =c p 1 x x +c p 2 (3.4) denotes a deterministic but nonlinear mapping and c, c p 1 and c p 2 are species specic coe- cients, respectively representing opacity, and leaf photosynthesis. Once the tree assimilation rate is computed, one can derive the mass production rate and update the state of each in- dividual by allocating energy to the maintenance of existing tissues, growth, seed production and death probability according to the following relationships, m i (t + t) =m i (t) +g m (A i (t))t ; i = 1; ;N(t) (3.5) where g m (x) = (c bio Yx (respiration +turnover)) (1reproduction) total mass (3.6) 59 Figure 3.1: Interaction between trees. A three-tree conguration gives rise to ve dierent leaf area index values L j denotes a deterministic and known relationship, m i (t) denotes mass at time t of individual X i , c bio is a constant converting assimilated CO 2 to dry mass and Y is the ratio of carbon xed in mass per carbon assimilated. And from m i , the size and composition of trees can be updated as well as the seedling rate and death probability. Seed Dispersal When a tree reaches its maturation height, it starts allocating some energy (or mass) to produce seeds. Those are then randomly dispersed around the tree according to a Gaussian kernel. The size of the kernel is parametrized in the model and for the simulation results following in this chapter, we used amplitudes of i = 0:1hmat i around the coordinates of each tree X i . In future analysis, we could look at the in uence of the Gauss kernel size on the results. One computational challenge was to develop an ecient sampling technique for 60 the seed dispersal. Indeed, as the seed size decreases, more seeds are produced at maturation resulting in heavy sampling requirements. We proceeded using rejection sampling. In other words, overlapping the shade matrix to the Gauss kernel reduces the possible sampling space to the areas most exposed to light. 2.3 Disturbances For this study, the simulated forests are subjected to disturbance regimes. We assume that the time between disturbances is exponentially distributed, the center of a disturbance occurs at a random point in space and a disturbance destroys a fraction of all trees located within a given distance from its center. Accordingly, disturbance regimes are modeled as spatially and time homogeneous Poisson processes with rate parameter (x;t) = where x is a spatial coordinate and t is time. We characterize a disturbance regime with three parameters: Average frequency of event . Average size of disturbed areas . Probability that any given tree dies within the disturbed area 2 [0; 1]. By taking the product of these three parameters I = , we obtain the intensity I at which the forest is disturbed. In this way, we can represent a wide array of disturbances, ranging from a single tree fall (small area with a high probability of tree death) to wind disturbing large forested regions (large area with a low probability of tree death). Complete spatial randomness, i.e. a constant rate parameter is an assumption that is easily relaxed when modeling real forested region. Also, a dependence of on the tree species (particularly on the wood density) and size can be added just as easily. Finally, the geometry of disturbed areas may also be controlled to account for real forest re dynamics (e.g. wind driven propagation). 61 This model varies from the classical re model on domainS rst introduced by Henley [1989] and Drossel and Schwabl [1992], where matches are falling on single sites of S at rate , instantaneously making those vacant. However, their are equivalent with = 1, ! 0 and = 1. For studies on real forest re characteristics in dierent forested landscapes and their power law and scale invariance properties see Holmes et al. [2008]. 2.4 Environmental Stochasticity Tree congurations are driven by a birth-death-growth-interaction process. Following a continuous- time Markov process, changes in the pattern of tree locations are determined by birth and death events. New individuals are recruited into the population at a rate that depends on the reproductive rate of each tree and on seedling survival, and are removed from the population at a rate that depends on wood density, current neighborhood conditions and disturbances. The reproductive rate is a deterministic function of the mass production expressed in equation 4.4. This model is markovian because the forest conguration at the next time step only depends on the current conguration and external random processes (i.e. disturbance processes). 2.5 Model Implementation The model was implemented in C++. Because individual-based models can be computation- ally burdensome, it is imperative to store the forests state eciently. Therefore, we use the spatial Markov property of forest and divide the torus into coarse cells such that it suces to compute the interactions between same-cell and neighbor-cell trees. This representation greatly reduces the determination of all vital rates from quadratic to linear complexity in the total number of trees. Initial forest congurations are generated randomly and run without disturbances during a "burn in" period, e.g. until equilibrium is reached. At each xed time 62 step, we compute the energy budget and update the state of every tree in the forest. While it is unnecessary to store the point congurations in time, it is important to store the distributions of traits, sizes, and locations. 3 Results 3.1 Forest Ecology We rst run the model with one hundred randomly placed seeds on a domain of 10000 meter squared. The canopy development of the undisturbed forest is then observed until equilibrium is reached. In this section, we do not allow any trait mutation during the reproduction stage. In this experiment, we wanted to verify that the forest model behavior was comparable to a real one. As in the real world, three phases of establishment can be identied during canopy development (gures 3.2 and 3.3): Pure growth: the total biomass increases because of tree growth but the forest density stay constant (horizontal line in gure 3.3). In gure 3.2, this phase is illustrated by the height distribution being centered at a single-value that increases with time. Reproduction: the initial tree population reaches maturation height and starts allocating its energy to seed production thereby increasing the density of the forest and halting the biomass production (vertical line in gure 3.3) Self-thinning: competition for light and the rarefaction of gaps in the canopy put a limit on the forest density and total biomass. Once the equilibrium is reached, the forest properties oscillate along the diagonal line in gure 3.3 63 Figure 3.2: Tree height distribution (shades of green) evololving in time (horizontal axis) during the canopy development. Heights range from 0 (bottom) to 25 meters at the top of the vertical axis Figure 3.3: Self-Thinning: Leaf mass average (horizontal axis) vs. forest density (vertical axis) 64 3.2 Demographic Here, we observe the eect of varying disturbance regimes on the demographic composition of a multi-species forest. Multi-species forest are simply obtained by assigning dierent trait values to dierent trees. In particular, the traits are distributed accorded to a Gaussian distri- bution. Starting from an undisturbed forest conguration in its equilibrium state, we subject the forest to dierent disturbance regimes. In the following results, the average disturbance return period is decreased from 100-year to 2-year between two successive disturbance oc- currences. We observe the impact of the dierent disturbance regimes in terms of salient aggregate properties of forest (gures 3.4, 3.5, 3.6 and 3.7), namely the leaf area index, the average height, the net primary production and the biomass density. Those four aggre- gate properties were chosen to facilitate future comparisons with results obtained with Falster et al. [2011] size-structured metapopulation model. Looking at the evolution of the trait dis- tributions as the forest is subjected to disturbance regimes, we see interesting branching-like patterns (gure 3.8 and 3.9). 3.3 Evolutionary During the reproduction phase, trees transmit their functional trait to their osprings. For this experiment, traits values are allowed to mutate arbitrarily as they are being transferred from parent to ospring. Here, a one percent standard deviation around the trait value was used as the allowable mutation with a Gaussian distribution. Let y i be the considered functional trait of tree X i allowed to mutate at reproduction, thus y i 2flma i ; hmat i ; i ; s i g. Then when passed to its ospringsfX i k g k=1;:::;n i (t) , where n i (t) denote the number of osprings produced by X i at time t, y i k N (y i ; 0:1y i ). As described previously, the forest initial 65 Figure 3.4: In uence of dierent disturbance regimes on leaf area index Figure 3.5: In uence of dierent disturbance regimes on average height 66 Figure 3.6: In uence of dierent disturbance regimes on net primary production Figure 3.7: In uence of dierent disturbance regimes on biomass density 67 Figure 3.8: Starting with smooth Gaussian distribution for the LMA (dark green), the pop- ulation evolves (in shades of yellow) into 2 species and eventually only 1species is left under frequent disturbance regime 68 Figure 3.9: Starting with a wide Gaussian distribution for the maturation height (dark green), same pattern occurs as for the LMA 69 Figure 3.10: In uence of dierent disturbance regimes on leaf area index conguration is the steady-state of an undisturbed forest. We now allow the four functional traits characterizing the trees to vary and the disturbance regimes vary as explained in the previous section. First, we analyze the impact on the aggregate properties of the forest (gures 3.10, 3.11, 3.12, and 3.13), then we look at the density evolution of the tree functional traits in time (gures 3.14, 3.15, 3.16 and 3.17). 4 Discussion The spatially explicit and individual-based model presented in this work reproduces well-known ecological phenomena such as the self-thinning phase that occurs during the forest canopy establishment (gures 3.2 and 3.3). In the subsequent experiments, we have focused on a particular scenario: the impact of increasing disturbance frequencies on the steady-state of a multi-species forest. In the demographic study, the tree characteristics are passed from parent 70 Figure 3.11: In uence of dierent disturbance regimes on average height Figure 3.12: In uence of dierent disturbance regimes on net primary production 71 Figure 3.13: In uence of dierent disturbance regimes on biomass density to ospring unchanged. From this simple study, the model predicts a decrease in the average tree height of a forest subjected to more frequent disturbances (gure 3.5). This observation is explained by the fact that trees with smaller maturation height will naturally be selected in the case of frequent disturbances as the larger ones will likely be destroyed by disturbances before they reach maturation. We also observed the branching patterns that materialized in the multi-species perturbed forest simulation. In fact, from a distribution of values for the leaf mass per area and the maturation height (gures 3.8 and 3.9), only a couple of trait values subsist as time increases under frequent disturbances. In particular, with an average of 5 years between two successive disturbances, only one trait is selected. Regarding the evolutionary study, it seems that the large range of possible trait values often drives the forest out of its equilibrium state. The forest experiences more transient phases before reaching what seems 72 Figure 3.14: Starting with smooth Gaussian distribution for the LMA (dark green), the distri- bution spread slightly but seems unaected by changes in the disturbance regime 73 Figure 3.15: Starting with smooth Gaussian distribution for the maturation height (dark green), the distribution spreads slightly and shifts to smaller maturation heights as time in- creases and as disturbance regimes become more frequent 74 Figure 3.16: Starting with smooth Gaussian distribution for the wood density (dark green), the spread of the distribution with time is less pronounced for more frequent disturbances. Again, in this case we can observe a slight shift to less dense wood species as disturbances become more frequent 75 Figure 3.17: Starting with smooth Gaussian distribution for the seed size (dark green), the spread of the distribution with time is less pronounced for more frequent disturbances. Here, we can observe a strong shift to forests producing smaller seed sizes as disturbances become more frequent 76 to be local steady states (gure 3.10, 3.11, 3.12, and 3.13). In addition, many trait values are explored but many also die out. 5 Conclusion The model presented in this chapter provides a exible simulation tool to study the impact of dierent disturbance regimes on forests. Indeed, the four functional traits characterizing trees enable single species forest representations when only one set of values is used, as well as mul- tiple species forests when distributions of values are employed thereby facilitating the design of arbitrary forest congurations but also a variety of real-world vegetation patterns. Hence, this model is capable of carrying out unrestrained simulations under numerous disturbance regime scenarios. Furthermore, the predicted forest response is formulated in terms of demographic impacts and eects on aggregate forest properties, in addition to evolutionary impacts on the tree functional traits and forest species composition. The dierent experiments simulated in this sturdy, however, reveal only a small fraction of the model capabilities. This simulation tool could also provide decision support for policy makers especially regarding the control of human generated disturbances such as logging. We plan to further explore the evolutionary processes of forests with additional in depth experiments such as the investigation of the dif- ferent functional traits independently at rst in order to better interpret the present results, and then examine coevolution. We still need to compare the model predictions with other studies and empirical data to truly verify their accuracy. Especially, a systematic comparison of our results with the ones obtained by Falster et al. [2011] metapopulation models should also provides strong grounds to evaluate the situations where the use of complex spatial models can improve the accuracy of the predictions. 77 Computer power still remains an obstacle to simulate models of complex systems. There- fore there exist, in the research community, an ongoing eort to develop increasingly more ecient representations. Whether in the form of parallel algorithms or of mathematical ap- proximations, those methods are necessary to run simulations over long time-scales. We need to keep in mind though that approximations are adequate as long as one has a sense of the behavior of the error between the true model output and the reduced one. In the next chapter, a model reduction method is presented. It consists in a coarse-graining procedure applied to the spatially explicit and individual-based model of forest that was presented in the previous chapter. 78 Chapter 4 Multiscale Forest Dynamics Abstract This chapter investigates a multiscale characterization of individual-based dynamics. At the ne scale, an individual-based and spatially explicit point process model is used to describe ecological processes relevant to the evolution of individuals and their mutual interactions. At the coarse scale, a stochastic model is synthesized to describe the dynamics of the system on larger spatial scales. The upscaling technique is designed to eciently coarse-grain very ne scale variables in order to express new state dynamics that no longer require access to ne details. The coarse dynamics are carefully constructed to preserve uctuations from localized competition by retaining subgrid information obtained from precomputed Monte- Carlo simulations of the ne scale dynamics. Implications to computational eciency and cross-scale coupling are signicant. The present technique is then described in detail as it is applied to an individual-based and spatially-explicit forest dynamics model. 79 1 Introduction A challenge to managing complexity is the disparity of scales that matter to a particular decision. Understanding and managing the carbon cycle, for instance requires integration of knowledge and information from geophysical phenomena, observed and evolving over spatial scales of kilometers with forest growth, controlled and observed at the scale of individual trees. Similar observations are relevant to the management of processes taking place at the interface of urban and natural realms, such as urban wildres, where large scale processes such as wind interact with ne scale processes such as individual homes (delineating critical objectives), and individual trees (providing the fuel for the re). Individual-based models constitute potent analysis and exploration tools particularly in the presence of strongly nonlinear dynamics, where governing dierential equations for coarse scale observables may not be explicitly available. Often in these cases the complexity of interactions between individual agents is such that long- term prediction of quantities of interest remains computationally prohibitive if not elusive. This challenge to predictability is compounded by a lack of observability of the ne scales, usually remedied via their statistical description. Another confounding factor is the fact that, in many cases of signicance, quantities of interest to decision-making evolve on spatial and temporal scales that are distinct from those on which individual agents are described. This multitude of relevant scales is typical, generally, in complex systems and in particular in ecology where evolutionary processes manifest themselves on particularly long time scales. From a modeling perspective, as one xes the resolution of a model, processes can be divided in two exclusive categories: resolved scales and unresolved scales, i.e. structures that cannot be seen at the chosen resolution. Multiscale models attempt to relieve the modeler from arbitrary parametrizations or excessively ne resolution. The present work is an attempt to 80 develop upscaling strategies to describe, on a coarse scale, the dynamics driving individual- based models. Classical multiscale techniques have been abundantly used in the eld of molecular dy- namics where an eective macroscopic model is derived from microscopic equations through local averaging over temporal or spatial windows. Those techniques are referred to collec- tively as homogenization methods in the eld of continuum mechanics (Hou and Wu [1997]). In situations involving cooperative features evolving over many spatial and temporal scales, classical multiscale techniques tend to average out critical ne scale events responsible for phase transformation such as instability, bifurcation, and failure. In more recent approaches, there is a widespread attempt to maintain the eects of the localized features by augmenting a macroscale model with microscale information (from a model) as needed. Those meth- ods, which often rely on scale separability, include the projective and equation free methods (Gear and Kevrekidis [2003], Erban et al. [2006] and Givon et al. [2004]), the quasicontin- uum methods (Shenoy et al. [1999] and Miller and Tadmor [2002]), and superparametrization techniques, used specically in climate modeling, where the turbulent transport parameter is computed from "on the y", localized ne scale lattice simulations (Khouider et al. [2003], Majda et al. [1999], Majda et al. [2001], Majda et al. [2002], Majda et al. [2003], Majda and Khouider [2002], Majda and Grote [2009], Majda et al. [2010] and Xing et al. [2010]). The lat- ter depending heavily on work described in Katsoulakis and Vlachos [2003], Katsoulakis et al. [2003b], Katsoulakis et al. [2003a], Katsoulakis and ans A. Sopasakis [2006] and Katsoulakis and Trashorras [2006] where mathematical strategies to coarse-grain particle dynamics on microscopic lattice systems are rigorously developed. In this body of work, the authors begin with stochastic Ising-type microscopic models of particle dynamics on lattice systems, from which they derive approximating coarse-grained birth-death Markov process. Having fewer 81 observables, the coarse random process is far more suitable for computationally extensive sim- ulations, while retaining much of the microscopic interactions between neighboring particles, thereby allowing to describe the dynamics in mesoscopic or macroscopic length scales. Those multiresolution methods parameterize the level of coarse-graining to be performed, and oer exibility by allowing a progressive approach as well as the use of several resolutions within one simulation (see Ismail et al. [2002] in the wavelet transform context). Our multiscale technique is here applied specically to a forest dynamics modeling problem. Forest dynamics are inherently multiscale. At the level of individual trees, the seed dispersal and the competition for light and nutrients are the main drivers of the dynamics. However, natural and anthropogenic disturbances such as res, which are swift to reshape forests but also can entirely change the species composition of forests, act at mesoscopic length scales. Finally, climate and the available nutrients in the soil are driving factors at macroscopic scales. In our setting, we do not have a dened macroscale model available to us. We only have a spatially explicit and individual based model of forest with dynamics depending on very ne scale variables. While individual-based forest models successfully reproduce real forest behaviors (see forest described as spatio-temporal point processes in Gavrikov and Stoyan [1995], Mateu et al. [1998], Comas and Mateu [2007], Comas and Mateu [2008], S arkk a [2006], S arkk a and Renshaw [2006], Stoyan and Stoyan [1998], Stoyan and Penttinen [2000], Busing [1991] and Busing and Mailly [2004]), they are computationally intensive and prohibit long-term predictions on large spatial domains. Motivated by the high computational cost of resolving the ne scale information, together with the heterogeneity and uncertainty in such information, we derive a stochastic model, using a Markov process representation, which describes the dynamics using only coarse scale variables, without explicit coupling to ne scale variables. An important attempt of the present coarse graining procedure is to avoid averaging 82 out the highly non-uniform light distribution that occurs underneath the forest canopy. Indeed, the light that is available to a tree or seed is what drives its carbon dioxide assimilation rate and thus its growth rate or chances of germination. It is therefore crucial to accurately represent light availability in the coarse model as well, so as to reproduce the diversity of tree sizes that exists in natural forests. However the precise light distribution for a given forest conguration can only be obtained from a fully resolved simulation. By observing that the light reaching a tree strongly depends on the tree height and location within the canopy structure of the neighborhood, we construct probability distribution functions (pdf) of the canopy openness conditioned on available coarse scale information or patch properties such as density and leaf area index at dierent canopy depths. The pdfs are empirically computed from Monte-Carlo simulations run at the ne scale. Thereby, the light distribution is approximated within a forest stand via the conditional pdfs representing the canopy openness. At the coarse scale, the state of the forest is updated by sampling values of the canopy openness from those conditional pdfs to generate the increment of the Markov process representing the coarse-grained dynamics. Our approach is similar in principle to the stochastic parametrization technique developed by Crommelin and Vanden-Eijnden [2008], which the authors applied to the Lorenz 96 model. However, we do not run the microscale and the macroscale models concurrently, instead the ne scale simulation is run once, initially, to construct a set of conditional distributions. The chapter begins with a description of the ne scale forest model. This is followed with a description of the coarse-graining procedure and the development of the Markov process. Finally we illustrate our method with a numerical example. 83 2 Forest Model as a Point Process Acquiring a detailed understanding of vegetation dynamics is challenging due to the many diverse processes that drive the ecology and evolution of vegetation structures. At the phys- iological level, light, moisture, temperature, and nitrogen availability are the key factors to which plants have to adapt. At the level of an individual tree, neighborhood interactions that arise from competition for light and from soil processes in uence the local conditions that in turn aect tree development. At the level of a stand or forest, natural and anthropogenic disturbances such as re, disease spread, or logging greatly in uence the size distribution and species composition of vegetation structures. The individual-based model of forest dynamics that is described in the following sections constitute a framework for the theoretical study of forest as an evolutionary process. The physiological model for trees was developed by Falster et al. [2011] where the authors describe the necessary factors involved in describing the life cycle of trees and their in uence on aggregate properties of forests. In the present work, we use this model and the same parameter values in a spatially explicit setting where trees are dened by their coordinates and interact by competing for light in a full three-dimensional space. 2.1 Point Process Representation A forest is described as a spatio-temporal point process evolving on the d-dimensional torus T := [0; 1] d . The torus is obtained by identifying opposite sides of the hypercube in R d , hence it has the characteristic of being nite yet unbounded, which simplies the problem mathematically while providing a rational resolution of boundary conditions. A point process on a domain is identied at any given time if one can enumerate all the individuals contained 84 in any region A of . This number will be denoted by N(A). A forest conguration t is dened by the set of all the trees t = n X i ; i = 1;:::;N(t) o occupying the forest domain at a given timet. Thus,N(t) is the counting process associated with the number of individuals in the forest at xed timet. The trees are individually characterized by their spatial coordinates x i 2T and by their functional traits f i 2M. The traits, which identify the tree species, were chosen according to Falster et al. [2011]'s study. Specically, the traits that mark each individual consist of the leaf mass per area (lma), the maturation height (hmat), the wood density () and the seed size (s) (at birth and for ospring), thereforef i = (lma i ; hmat i ; i ; s i ) where each component is a scalar value inR. 2.2 Individual-Based Dynamics The tree dynamics comprises four dierent phases, which constitute its life cycle, namely establishment (whether the dispersed seed germinates or not), growth (production of tissues), reproduction (when a tree reaches its maturation height, it starts producing and dispersing seeds) and nally death. At each one of these stages, the tree survival and evolution depends on how much matter it is able to produce. It has to generate at least enough matter to maintain the existing tissues, and even more in order to grow or reproduce. The energy required for a plant to produce any matter is generated through photosynthesis: the strength of light that is received by leaves determines the amount of carbon dioxide that a plant can assimilate and convert into energy for mass production. Being mainly driven by how much light the leaves can intercept, tree dynamics are very sensitive to the shade resulting from overlapping tree canopies. In that sense, two trees are interacting if their foliages overlap. To compute the amount of light that is available in dierent areas of the forest canopy, we assume that trees have at tops, i.e., their respective crowns are composed of a single layer 85 of leaves at the top of the tree (gure 3.1). Higher layers shade the layers beneath if they overlap horizontally. The shading intensity is weighted with opacity coecients to account for the vertical separation between tree crowns. Growth and Reproduction To compute the total mass that a tree can produce at a given time, one needs to evaluate its carbon dioxide assimilation rate. Let A i (t) be the assimilation rate of tree X i at time t, A i (t) is a function of how open the forest canopy is at the height of the tree denoted by h i . Let E(x;h) represent the canopy openness value at x2T and height h, it can be expressed as a function of the leaf area index (LAI) at that point L(x;h), which depends on the total leaf area above that point. The LAI can be very costly to compute explicitly as the number of trees increases in the forest (see gure 3.1). E(x;h)2 (0; 1] is one if the canopy is fully open at point (x;h). Let a i be the ground area occupied by the canopy of tree X i , ml i be the leaf mass of X i and ! i = ml i =lma i be the total leaf area of individual X i , then we can write: A i (t) = X x2a i g E (E(x;h i )) (4.1) E(x;h) =e cL(x;h) (4.2) L(x;h) = X k:(x2r k )\(h k h) ! k =a k (4.3) where g E denotes a deterministic but nonlinear mapping (equation 3.4) and c a species specic opacity coecient. Once the tree assimilation rate is computed, one can derive the mass production rate and update the state of each individual by allocating energy to the 86 maintenance of existing tissues, growth, seed production and death probability according to the following relationships, m i (t + t) =m i (t) +g m (A i (t))t ; i = 1; ;N(t) (4.4) where g m (described in equation 3.6) denotes a deterministic and known relationship, m i (t) denotes mass at time t of individual X i . And from m i , the size and composition of trees can be updated as well as the seedling rate and death probability. 3 Coarse-Grained Model The spatially explicit individual-based model described above is computationally intensive because the interaction processes are computed explicitly through the LAI value L(x;h) and the canopy openness value E(x;h) across the leaf area of every individual. The required computer memory therefore greatly limits the size of the forest domain under investigation, which compelled us to explore an alternate approach. In the remainder of the chapter, we show how to construct a Markov model for the tree dynamics on a suitable chosen coarse spatial scale, where the noise characterization comes from ne scale simulation. 3.1 Neighborhood Description We divide the spatial domain into neighborhoods or forest stands by overlaying a regular grid of size dx onto the torus. The cells, denoted by n D j 2T ; j = 1;:::;N D o where N D is the total number of cells obtained by this process, are characterized at time t by the number of individuals, N D j (t), in the neighborhood and the set of individuals n X j k ; k = 1;:::;N D j (t) o located in that same neighborhood. TheX j k are characterized as described previously, however 87 their explicit coordinates within the patch will no longer in uence the stand dynamics in the coarse grained representation. 3.2 Conditional Stochastic Dynamics Assuming we can nd an eective value expressed as E eff j k (t) that approximates the canopy openness across the entire leaf exposure area of a tree X j k , then we can compute the carbon dioxide assimilation rate as A j k (t) = g E (E eff j k (t)). Because the function g E is invertible, at the ne scale, we can explicitly compute E eff j k (t) =g 1 E (A j k (t)). Monte-Carlo simulations of the ne scale dynamics allow us to characterize the probability distribution functions l (E eff j k ) conditioned on homogenized LAI values: l (E eff j k ) =P E eff j k L(h j k )2 [l;l + 1] (4.5) L(h) = X j l :h j l h ! j l Area(D j ) (4.6) For l = 0; 1;:::;n. Where L(h) is the light distribution parameter at height h that is homogenized across the forest stand (or homogenized LAI) andArea(D j ) is the area of stand D j . At l = 0, 0 (E eff j k ) represents the distribution of the canopy openness parameter for trees residing at the top of the canopy. Since no shading occurs, the canopy is fully open with probability 1. Driving Noise Characterization Once the conditional distributions are available, one can sample an eective canopy openness values for any focal tree X j k in the forest stand based on the homogenized LAI at the tree 88 Figure 4.1: Coarse-grained Interaction between trees: light values at dierent depths are homogenized across the patch height h j k given a neighborhood conguration. From there, the increment of the Markov process driving the dynamics of tree X j k 2D j is easily computed as follows: m j k (t + t) =m j k (t) +g m ( t )t ; k = 1; ;N D j (t) (4.7) t =g E ( t ); t l j k (E eff j k ) (4.8) Where l j k is such that l j k L(h j k )<l j k + 1 and t is a stochastic process conditioned on coarse descriptors at the level of the neighborhood. 4 Computational Steps The approach relies on both the ne and the coarse scale representations, however the ne scale model is only run once, initially, and on a small spatial domain, to empirically compute 89 the required conditional distribution functions. The procedure can be summarized in the fol- lowing steps and the pseudo-code that describes the implementation of the proposed upscaling methodology is described in algorithm 6 below. 1. Monte Carlo simulation: the spatially explicit individual-based model is run on a domain of the size of a forest stand (equation 4.4) 2. Construct the conditional pdfs l (E eff i ) for l = 1; 2;:::;n (equation 4.5) 3. Coarse grained simulation (equation 4.7) Algorithm 6 Coarse dynamical model for time t from 1 to T do for D j ;j from 1 to N D do L(h) = P j k :h j k h ! j k Area(D j ) , h2 [0;h max ] for all X j k 2D j do Draw j k (t) from l j k (E eff j k ) Compute assimilation rate j k (t) =g E ( j k (t)) Update individual mass m j k (t + t) =m j k (t) +g m ( j k (t))t Update individual size, seedling rate and death probability For each new seeds, draw from n (E eff ) if =g E () is large enough, recruit new individual end for end for end for 90 5 Results and Discussion A series of simulation experiments were designed and run on the spatially explicit and individual- based model to determine meaningful coarse descriptors. Monte Carlo simulations at the ne scale level subsequently allowed for the empirical probability distributions conditioned on those aggregate properties to be computed. The coarse-dynamics for the reduced model result from the stochastic parametrization of those distributions. Figure 4.2 illustrates sample trajectories of eective canopy openness values for individual trees during their life time. Those paths were drawn from the individual-based and spatially explicit model. The chaotic nature of those trajectories, namely the sudden jumps and vari- ations in the slopes of the curves, is mainly due to the interactions between individuals such as large trees lowering the light availability underneath their foliage or inversely large trees dying, thereby creating a gap in the forest canopy allowing for new recruits and growth of smaller trees. Because those events are highly localized, it is challenging to reproduce this phenomenon with a coarse-scale approach given the approximative individual coordinates (at the coarse scale, each tree is associated to a patch representing a forest neighborhood). As we are attempting to approximate the individual mass production with stochastic pro- cesses, let us examine the conditional pdfs of the eective tree exposure variable E eff as shown in gure 4.3. Those are empirically computed from a large number of ne-scale sim- ulations. The conditioning on the aggregate LAI property taken at dierent depths of the stand canopy, allows for the construction of nearly well-dened distributions. Especially, one can notice the peak at 1 for trees located at the top of the canopy (i.e. the larger ones) and a clear shift of the distribution towards 0 as the leaf area index increases. This result is 91 expected, as the canopy should be fully open above trees residing at the top of the canopy with probability 1. To verify the accuracy of our coarse model, we compare properties of the forest stand obtained with the ne and the coarse scale dynamics. Forest aggregate properties include biomass density, average height and size distribution, germination and seedling rates. These properties characterize the forest because in the absence of external disturbances, the dynamics uctuate around equilibrium values that are the same across realizations. The two following results constitute a heuristic validation of our upscaling technique: First, the size distributions resulting from the micro and macro models illustrated in gure 4.4 exhibit a similar structure. The pdfs were constructed at steady state when forest stands have a majority of smaller size trees with a number of large trees occupying the top of the canopy. For single realizations of the ne and the coarse models (gure 4.5), and looking at trajectories of aggregate properties of the forest stand, one may notice that patterns and steady state values occurring during the ne scale simulations are conserved in the stochastic coarse-scale dynamics. This comparison shows that the coarse model can approximate the ne scale dynamics accurately. Especially, it can reproduce the rapid changes in mass production rate when sudden variations in the light availability due to interaction at the ne scale occur under the forest canopy. Finally, gure 4.6 illustrates the information loss that results from the coarse-graining method. While the size structure of the forest (gure 4.4) is maintained in the reduced model, the spatial conguration has lost the tight clusters of small trees growing in between the large ones. This is due to the use of the homogenized LAI L(h) rather than the localized 92 values L(x;h). Loosing the spatial structure of forest makes sense since the explicit tree coordinates are no longer utilized in the coarse dynamics, however, the information about the forest composition is accurately retained. 6 Conclusion In this study, we have designed a general upscaling methodology intended to describe the dynamics of individual-based and spatially explicit models using dierent length-scale variables. With suitable modications, it could be adapted to other spatially explicit individual based model. However, the success of the approach relies on the coarse scale descriptors that are chosen to condition the noise model. Coarse-graining requires careful handling; especially, we should keep in mind that the modication of a model dened at a specic scale to describe the process at a dierent scale may perturb the subtle balance between the input and the model error contributions. In this particular case though, the various results comparing the size structure as well as aggregate properties of the forest induced by the coarse dynamics with the ne scale model, which we assume to be the true model, heuristically validate our stochastic parametrization. Thus, it isn't necessary to explicitly resolve the interaction processes, when some particular aggregate neighborhood properties can predict the dynamics accurately. With this technique, we were able to reduce the computational time by a factor of 200, which enables an entire new range of analysis including the study of long term processes such as changes in forest demographics and evolutionary dynamics. The gain in eciency renders the model scalable to analysis on large spatial domains, which can lead to realistic scenario running in the context of forest management as well as solving optimization problems over the impact of possible policies. 93 0 20 40 60 80 100 120 140 160 180 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Time E eff Sample effective canopy openness trajectories Figure 4.2: Eective canopy openness trajectories for a sample of trees: values are obtained from the ne scale simulations. Time units are years 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 E eff Canopy Openness distribution conditioned on LAI value LAI in [0,2) LAI in [2,3) LAI in [3,3.5) LAI in [3.5,4) LAI in [4,4.5) LAI in [4.5,5) Figure 4.3: Probability distribution functions of canopy openness conditioned on dierent Leaf Area Index values of the forest canopy 94 −2 0 2 4 6 8 10 12 14 16 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Height of Individuals Fine scale dynamics: Height distribution at steady state −2 0 2 4 6 8 10 12 14 16 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Height of Individuals Coarse scale dynamics: Height distribution at steady state Figure 4.4: Forest size distribution at steady state for the ne scale model and the coarse scale model, top and bottom gures respectively 95 0 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 Aggregate forest properties (fine scale model): Biomass Density 0 100 200 300 400 500 600 700 800 900 1000 0 2 4 6 Average Height 0 100 200 300 400 500 600 700 800 900 1000 0 0.2 0.4 Germination to seed ratio 0 100 200 300 400 500 600 700 800 900 1000 5 10 15 20 Aggregate forest properties (coarse scale model): Biomass Density 0 100 200 300 400 500 600 700 800 900 1000 0 2 4 6 Average Height 0 100 200 300 400 500 600 700 800 900 1000 0 0.2 0.4 Germination to seed ratio Figure 4.5: Time evolution of particular forest aggregate properties for the ne scale model and the coarse scale model, top and bottom gures respectively 96 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Figure 4.6: Forest conguration for the ne scale model and the coarse scale model, top and bottom gures respectively 97 Chapter 5 Conclusion and Outlook The dierent chapters of this thesis illustrate how stochastic and multiscale models can in- form the decision-making process not only by making predictions but also by evaluating those assessments. In Chapter one, the design of a sensor network aimed at maximizing the prob- ability of detection of accidents and intrusions in water distribution systems was addressed. A stochastic parametrization allowed for the introduction of uncertainty and heterogeneity in the water demand patterns as well as imperfect sensor responsiveness. The results showed that the optimal imperfect sensor placement was more robust to uctuations in the demand values than the perfect sensor placement. However, the number of sensors required to eec- tively protect water networks in the imperfect sensor case is signicantly higher than when sensors are assumed to be perfect detectors. Also, a two-stage analysis was undertaken to optimize the placement of an additional sensor network (or second stage sensors) as soon as a contamination event is detected by the single-stage sensor network in place. The method showed that once the Monte Carlo simulations have been run for the rst stage, the second stage becomes an ecient source identication approach requiring little computational power. Chapter two tackled the design of a spatially explicit and individual-based model to study the impact of dierent disturbance regimes on forests. Two dierent analyzes were undertaken: 98 one examined the demographic impacts ignoring trait mutation possibilities and the other one looked at the longer term eects on the evolutionary processes. The demographic study re- produces well-known ecological phenomena such as the self-thinning phase during the canopy development. It also corroborates the observation that the average tree height of a forest decreases when it is subjected to frequent disturbances. The initial forest conguration being composed of multiple species, the model also shows branching pattern and the selection of as few as one or two species as the disturbance regime increases in intensity. The evolutionary study, which consists in allowing trait mutations to occur at a tree reproduction stage, while less conclusive shows that an increase in the frequency of the disturbances drives the forest out of its equilibrium into transient phases and also greatly reduces biodiversity through a selection process in the long run. Finally chapter three investigated a coarse-graining methodology applied to the individual- based forest model developed in the rst chapter. This approach derives coarse dynamics from the individual-based models, which takes the form of a Markov process. The process of interest here is the assimilation rate of individual trees that uctuates based on the local lighting environment. The stochastic approximation driving the assimilation rate trajectories, which is solely conditioned on coarse scale discriptors, reproduces the real ones quite closely while requiring only a fraction of the computational power used for the ne scale model. The gain in eciency renders the model scalable to analysis on large spatial domains, which can lead to realistic scenario running in the context of forest management as well as solving opti- mization problems over the impact of possible policies. Building ecient models rely greatly on identifying the relevant questions of concern. Therefore, it is important to bring the decision-making element as early as possible in the 99 modeling process. Going further, a close collaboration with policy-makers could lead to the addition of a layer of modeling allowing for a policy to be simulated and its impacts anticipated before being implemented in the real world. For example the forest dynamics model can easily be used in a forest management context, where one would devise and compare dierent thinning (harvesting) strategies. The denition of the optimal strategy would not only consider short term benets but also long term impacts on biodiversity, which we know plays a key role in sustainability. Ecological models constitute a great asset for policymakers as they allow us to see further into the future and help us make sustainable decisions. 100 Bibliography K Abed-Meraim and Y Grenier. Blind separation of audio sources using modal decomposition. Proceedings of the 8th International Symposium on Signal Processing and Its Applications, 2:451{454, 2006. F Acevedo, D. L. Urban, and H. H. Shugart. Models of forest dynamics based on roles of tree species. Ecological Modelling, 87:276{284, 1996. C. Andrieu, M. Davy, and A. Doucet. Ecient particle ltering for jump markov systems. application to time-varying autoregressions. Signal Processing, Jan 2003. A. Baddeley. Spatial point processes and their applications. Springer, Lecture Notes in Mathematics, 2007. A. Baddeley, P. Gregori, J. Mateu, R. Stoica, and D. Stoyan. Case Studies in Spatial Point Pattern Modelling. Springer, Lecture Notes in Statistics, 2006. O. E. Barndor-Nielsen, D. R. Cox, and C. Kl uppelberg. Complex Stochastic Systems. Chapman & Hall/CRC, 2001. J. Berry, L. Fleischer, W.E. Hart, C.A. Phillips, and J.P. Watson. Sensor placement in municipal water networks. Water Resources Planning and Management, 131(3):237{243, 2005. J. Berry, W.E. Hart, V.J. Leung, C.A. Phillips, and J.P. Watson. On the placement of imperfect sensors in municipal water networks. In In Proceedings of the 8th Symposium on water distribution systems, 2006a. J. Berry, W.E. Hart, J.G. Uber, and J.P. Watson. Sensor placement in municipal water networks with temporal integer programming models. Water Resources Planning and Man- agement: special issue on drinking water distribution systems security, 2006b. B. Bollob as. Graph Theory: An Introductory Course. Springer, 1994. D. B. Botkin, J. F. Janak, and J. R. Wallis. Some ecological consequences of a computer model of forest growth. The Journal of Ecology, 60(3), 1972. 101 A. O. Boudraa and J. C. Cexus. Emd-based signal ltering. IEEE Transactions on Instru- mentation and Measurement, 56(6):2196{2202, 2007. F. Bousquet and C. LePage. Multi-agent simulations and ecosystem management: a review. Ecological Modelling, 176(3-4), 2004. R. T. Busing. A spatial model of forest dynamics. Vegetatio, 92, 1991. R. T. Busing and D. Mailly. Advances in spatial, individual-based modeling of forest dynamics. Journal of Vegetation Science, 15:831{842, 2004. J.V. Candy. Bayesian Signal Processing: Classical, Modern, and Particle Filtering Methods. Wiley, 2009. F. Caron, M. Davy, E. Du os, and P. Vanheeghe. Particle ltering for multisensor data fusion with switching observation models: Application to land vehicle positioning. IEEE Transactions on Signal Processing, 55(6):2703{2719, June 2007. ISSN 1053-587X. R.D. Carr, H.J. Greenberg, W.E. Hart, G. Konjevod, E. Lauer, H. Lin, T. Morrison, and C.A. Phillips. Robust optimization of contaminant sensor placement for community water systems. Mathematical Porgramming: Series A and B, 107(1):337{356, 2006. P. Castiglione, M. Falcioni, A. Lesne, and A. Vulpiani. Chaos and Coarse Graining in Statis- tical Mechanics. Cambridge University Press, 2008. S. Chang, W. Svelka, and M. Shinozuka. Linking infrastructure and urban economy: sim- ulation of water-disruption impacts in earthquakes. Environment and Planning B: Planning and Design, 29:281{301, 2002. C. Comas and J. Mateu. Modelling forest dynamics: A perspective from point process methods. Biometrical Journal, 49(2), 2007. C. Comas and J. Mateu. Space-time dependence dynamics for birth-death point processes. Statistics and Probability Letters, 78, 2008. A. Corral, L. Telesca, and R. Lasaponara. Scaling and correlations in the dynamics of forest- re occurrence. Physical Review E - Statistical, Nonlinear and Soft Matter Physics, 77(1), 2007. D Crommelin and E Vanden-Eijnden. Subgrid-scale parameterization with conditional markov chains. Journal of the Atmospheric Sciences, 65(8):2661{2675, 2008. D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer, 2003. R de Borst. Challenges in computational materials science: Multiple scales, multi-physics and evolving discontinuities. Computational Materials Science, Jan 2008. 102 A. Doucet and C. Andrieu. Iterative algorithms for state estimation of jump markov linear systems. IEEE Transactions on Signal Processing, 49(6):1216{1227, 2001. A. Doucet, A. Logothetis, and V. Krishnamurthy. Stochastic sampling algorithms for state estimation of jump markov linear systems. IEEE Transactions on Automatic Control, 45(2): 188{202, 2000. A. Doucet, N. de Freitas, and N. Gordon. Sequential Monte Carlo methods in practice. Springer, 2001a. A. Doucet, N. J Gordon, and V. Krishnamurthy. Particle lters for state estimation of jump markov linear systems. IEEE Transactions on Signal Processing, 49(3):613{624, 2001b. H Dragert, K Wang, and TS James. A silent slip event on the deeper cascadia subduction interface. Science, 292(5521):1525, 2001. B. Drossel and F. Schwabl. Self-organized critical forest-re model. Physical Review Letters, 69, 1992. R. Durrett. Probability: Theory and Examples. Cambridge University Press, 2010. U.S. EPA. Assessing potential impacts associated with contamination events in water dis- tribution systems: A sensitivity analysis. Technical report, U.S. Environmental Protection Agency, Washington, DC, EPA/600/R-10/061, 2010. R. Erban, I. G. Kevrekidis, and H. G. Othmer. An equation-free computational approach for extracting population-level behavior from individual-based models of biological dispersal. Physica. D, 215(1), 2006. Daniel S Falster, Ake Br annstr om, Ulf Dieckmann, and Mark Westoby. In uence of four major plant traits on average height, leaf-area cover, net primary productivity, and biomass density in single-species forests: a theoretical investigation. Journal of Ecology, 99(1):148{ 164, 2011. P Flandrin and G Rilling. Empirical mode decomposition as a lter bank. Signal Processing Letters, Jan 2004. L. E. Frelich. Forest Dynamics and Disturbance Regimes: Studies from Temperate Evergreen- Deciduous Forests. Cambridge University Press, 2002. J Fukuda, T Higuchi, S Miyazaki, and T Kato. A new approach to timedependent inversion of geodetic data using a monte carlo mixture kalman lter. Geophysical Journal International, 159(1):17{39, 2004. J. Fukuda, S. Miyazaki, T. Higuchi, and T. Kato. Geodetic inversion for space-time distri- bution of fault slip with time-varying smoothing regularization. Geophysical Journal Interna- tional, 2008. 103 V. Gavrikov and D. Stoyan. The use of marked point processes in ecological and environ- mental forest studies. Environmental and Ecological Statistics, 2, 1995. CW Gear and IG Kevrekidis. Telescopic projective methods for parabolic dierential equa- tions. Journal of Computational Physics, Jan 2003. D Givon, R Kupferman, and A Stuart. Extracting macroscopic dynamics: model problems and algorithms. Nonlinearity, Jan 2004. R Granat. Detecting regional events via statistical analysis of geodetic networks. Compu- tational Earthquake Physics: Simulations, Analysis and Infrastructure, Part II, pages 2497{ 2512, 2007. G. Grimmett and D. Stirzaker. Probability and Random Processes. Oxford University Press, 1992. F Gustafsson. Adaptive ltering and change detection. John Wiley & Sons, 2000. William E. Hart and Regan Murray. Review of sensor placement strategies for contamination warning systems in drinking water distribution systems. Journal of Water Resources Planning and Management, 136(6):611{619, 2010. C. L Henley. Self-organized percolation: a simpler model. Bulletin of the American Physical Society, 34, 1989. G. B. M. Heuvelink. Uncertainty analysis in environmental modelling under a change of spatial scale. Nutrient Cycling in Agroecosystems, 50, 1998. T Higuchi and J Fukuda. Application of monte carlo mixture kalman lter to gps data analysis for space-time inversion. Statistical Signal Processing, Jan 2004. T.P. Holmes, R.J. Huggett, and A.L. Westerling. Statistical analysis of large wildres, In The Economics of Forest Disturbances: Wildres, Storms, and Invasive Species. Springer, Dordrecht, The Netherlands, 2008. TY Hou and XH Wu. A multiscale nite element method for elliptic problems in composite materials and porous media* 1. Journal of Computational Physics, Jan 1997. N. E. Huang. Introduction to the hilbert huang transform and its related mathematical problems. Hilbert-Huang Transform and its Applications, 5:1{26, 2010. N. E. Huang and S. S. Shen. Hilbert-Huang transform and its applications. World Scientic Publishing Co., 2005. N. E. Huang and Z. Wu. A review on hilbert-huang transform: Method and its applications to geophysical studies. Reviews of Geophysics, 46(2), 2008. 104 N. E. Huang, M. L. C. Wu, S. R. Long, S. S. P. Shen, W. Qu, P. Gloersen, and K. L. Fan. A condence limit for the empirical mode decomposition and hilbert spectral analysis. Pro- ceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 459(2037), 2003. NE Huang, Z Shen, SR Long, MC Wu, HH Shih, Q Zheng, NC Yen, CC Tung, and HH Liu. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings: Mathematical, Physical and Engineering Sciences, 454 (1971):903{995, 1998. T. Huang and W. Ren. The orthogonal hilbert-huang transform and its application in earth- quake motion recordings analysis. 14th World Conference on Earthquake Engineering, Jan 2008. A. E. Ismail, G. C. Rutledge, and G. Stephanopoulos. Multiresolution analysis in statistical mechanics using wavelets to calculate thermodynamic properties. The Journal of chemical physics, 118(10), 2002. A.H. Jaswinski. Stochastic processes and ltering theory. Academic Press, 1979. S. J Julier and J. K Uhlmann. New extension of the kalman lter to nonlinear systems. Proceedings SPIE, Signal Processing, Sensor Fusion, and Target Recognition, 3068:182{193, 1997. A Kacha, G Hocepied, and F Grenez. Analysis of epileptic eeg signals by means of empirical mode decomposition and time-varying two-sided autoregressive modelling. 4th European Conference of the International Federation for Medical and Biological Engineering, pages 1231{1235, 2009. M. Katsoulakis, A. J. Majda, and D. G. Vlachos. Coarse-grained stochastic processes for microscopic lattice systems. PNAS, 100(3), 2003a. M. A. Katsoulakis and P. Plechac ans A. Sopasakis. Error analysis of coarse-graining for stochastic lattice dynamics. SIAM journal on numerical analysis, 44(6), 2006. M. A. Katsoulakis and J. Trashorras. Information loss in coarse-graining of stochastic particle dynamics. Journal of Statistical Physics, 122(1), 2006. M. A. Katsoulakis and D. G. Vlachos. Coarse-grained stochastic processes and kinetic monte carlo simulators for the diusion of interacting particles. The Journal of chemical physics, 119(18), 2003. M. A. Katsoulakis, A. J. Majda, and D. G. Vlachos. Coarse-grained stochastic processes and monte carlo simulations in lattice systems. Journal of Computational Physics, 186(1), 2003b. 105 B. Khouider, A. J. Majda, and M. A. Katsoulakis. Coarse-grained stochastic models for tropical convection and climate. PNAS, 100, 2003. D Kim and HS Oh. Emd: A package for empirical mode decomposition and hilbert spectrum. The R Journal, 2009. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598), 1983. A. Krause, J. Leskovec, C. Guestrin, J. VanBriesen, and C. Faloutsos. Ecient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management, 134(6):516{526, 2008. A. Kun, B. Oborny, and U. Dieckmann. Intermediate landscape disturbance maximizes metapopulation density. Landscape Ecology, 24:1341{1350, 2009. P Ladev eze, G Puel, A Deraemaeker, and T Romeuf. Validation of structural dynamics models containing uncertainties. Computer Methods in Applied Mechanics and Engineering, 195(4-6):373{393, 2006. CD Laird, LT Biegler, BG van Bloemen Waanders, and RA Bartlett. Contamination source determination for water networks. Journal of Water Resources Planning and Management, 131:125, 2005. J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, and N. Glance. Cost- eective outbreak detection in networks. In In 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007. S. A. Levin, B. Grenfell, A. Hastings, and A. S. Perelson. Mathematical and computational challenges in population biology and ecosystems science. Science, 275:334{343, 1997. W. Li and Y. Jia. Rao-Blackwellised unscented particle ltering for jump markov non-linear systems: an h approach. IET Signal Processing, 5(2):187, 2011. ISSN 17519675. T Lieu and C Farhat. Adaptation of aeroelastic reduced-order models and application to an f-16 conguration. AIAA journal, 45(6):1244, 2007. C. P. Liou and J. R. Kroon. Modeling the propogation of water-borne substances in distri- bution networks. Journal of American Water Works Association, 79(11):54{58, 1987. S Ma and S Hartzell. Hilbert-huang transform analysis of dynamic and earthquake motion recordings. Journal of engineering mechanics, 129:861, 2003. A. J. Majda and M. J. Grote. Mathematical test models for superparametrization in anisotropic turbulence. PNAS, 106(14):5470{5474, 2009. A. J. Majda and B. Khouider. Stochastic and mesoscopic models for tropical convection. PNAS, 99(3), 2002. 106 A. J. Majda, I. Timofeyev, and E. Vanden Eijnden. Models for stochastic climate prediction. PNAS, 96(26), 1999. A. J. Majda, I. Timofeyev, and E. Vanden Eijnden. A mathematical framework for stochastic climate models. Communications on Pure and Applied Mathematics, 54(8):891{974, 2001. A. J. Majda, I. Timofeyev, and E. Vanden-Eijnden. A priori tests of a stochastic mode reduction strategy. Physica D: Nonlinear Phenomena, 170(3):206{252, 2002. A. J. Majda, I. Timofeyev, and E. Vanden-Eijnden. Systematic strategies for stochastic mode reduction in climate. Journal of the Atmospheric Sciences, 60(14):1705{1722, 2003. A. J. Majda, J. Harlim, and B. Gershgorin. Mathematical strategies for ltering turbulent dynamical systems. Discrete and Continuous Dynamical Systems, 27(2):441{486, 2010. J. Mateu, J. L. Us o, and F. Montes. The spatial pattern of a forest ecosystem. Ecological Modelling, 108, 1998. JJ McGuire and P Segall. Imaging of aseismic fault slip transients recorded by dense geodetic networks. Geophysical Journal International, 155(3):778{788, 2003. N. Metropolis and S. Ulam. The monte carlo method. Journal of the American Statistical Association, 44(247), 1949. RE Miller and EB Tadmor. The quasicontinuum method: Overview, applications and current directions. Journal of Computer-Aided Materials Design, Jan 2002. J. Mladeno. Landis and forest landscape models. Ecological Modelling, 180:7{19, 2004. J. Mller. Spatial Statistics and Computational Methods. Springer, Lecture Notes in Statis- tics, 2003. K. Mosegaard and A. Tatantola. Monte carlo sampling of solutions to inverse problems. Journal of Geophysical Research, 1995. G. Nemhauser, L. Wolsey, and M. Fisher. An analysis of the approximations for maximizing submodular set functions. Mathematical Porgramming, 14, 1978. R Ohtani, J. J. McGuire, and P Segall. The network strain lter-a new tool for monitoring and detecting transient deformation signals in gps arrays. AGU Fall Meeting Abstracts, 1, 2004. A. Ostfeld and E. Salomons. Optimal layout of early warning detection stations for water distribution systems security. Journal of Water Resources Planning and Management - ASCE, 130(5):377{385, 2004. S. W. Pacala, C. D. Canham, and J. A. Silander. Forest models dened by eld measurements: I. the design of a northeastern forest simulator. Canadian Journal of Forest Research, 23: 1980{1988, 1993. 107 T. N. Palmer. A nonlinear dynamical perspective on model error: A proposal for nonlo- cal stochasticdynamic parametrization in weather and climate prediction models. Quarterly Journal of the Royal Meteorological Society, Jan 2001. E. Pardoux. Book Cover Markov Processes and Applications : Algorithms, Networks, Genome and Finance. Wiley, 2009. N. Picard and A. Franc. Aggregation of an individual-based space-dependent model of forest dynamics into distribution-based and space-independent models. Ecological Modelling, 145: 69{84, 2001. N. Picard, A. Bar-Hen, F. Mortier, and J. Chadoeuf. Understanding the dynamics of an undisturbed tropical rain forest from the spatial pattern of trees. Journal of Ecology, 97: 97{108, 2009. L. Poorter and F. Bongers. Leaf traits are good predictors of plant performance across 53 rain forest species. Ecology, 87:1733{1743, 2006. M. Propato. Contamination warning in water networks: general mixed-integer linear models for sensor location design. Water Resources Planning and Management, 132(4), 2006. D. W. Purves, J. W. Lichstein, N. Strigul, and S. W. Pacala. Predicting and understanding forest dynamics using a simple tractable model. PNAS, 105(44):17018{17022, 2008. C. P. Robert and G. Casella. Monte Carlo Statistical Methods. New York: Springer-Verlag, 2004. L.A. Rossman. The epanet programmer's toolkit for analysis of water distribution systems. In Annual Water Resources Planning and Management Conference. Environmental Protection Agency, 1999. L.A. Rossman and P. F. Boulos. Numerical methods for modeling water quality in distribution systems: A comparison. Journal of Water Resources Planing and Management, 122(2):137{ 146, 1996. L.A. Rossman, P. F. Boulos, and T. Altman. Discrete volume element method for network water-quality models. Journal of Water Resources Planning and Management, 119(5), 1993. A. S arkk a. Modeling growth of trees by space-time growth-interaction processes. In Inter- national Workshop on Spatio-Temporal Modelling (METMA3), 2006. A. S arkk a and E. Renshaw. The analysis of marked point patterns evolving through space and time. Computational Statistics & Data Analysis, 51, 2006. J. C. Scott. Seeing like a state: how certain schemes to improve the human condition have failed. Yale University Press, 1998. 108 P. Segall and M. Matthews. Time dependent inversion of geodetic data. JOURNAL OF GEOPHYSICAL RESEARCH, 102, 1997. F. Shang, J. G. Uber, and M. M. Polycarpou. Particle backtracking algorithm for water distribution systems analysis. Journal of Environmental Engineering, 2002. Y. Shastri and U. Diwekar. Sensor placement in water networks: a stochastic programming approach. Water Resources Planning and Management, 132(3):192{203, 2006. VB Shenoy, R Miller, EB Tadmor, D Rodney, and R Phillips. An adaptive nite element approach to atomic-scale mechanics: The quasicontinuum method. Journal of Mechanics and Physics of Solids, 47:611{642, 1999. H. H. Shugart and D. C. West. Size and pattern of simulated forest stands. Forest Science, 25:120{122, 1979. D. Sornette. Critical Phenomena in Natural Sciences: Chaos, Fractals, Selforganization and Disorder: Concepts and Tools. Springer, 2006. J.C. Spall. Introduction to Stochastic Search and Optimization. Wiley, 2003. D. Stirzaker. Stochastic Processes and Models. Oxford University Press, 2005. D. Stoyan and A. Penttinen. Recent applications of point process methods in forestry statis- tics. Statistical Science, 15(1), 2000. D. Stoyan and H. Stoyan. Non-homogeneous gibbs process models for forestry - a case study. Biometrical Journal, 40(5), 1998. D. Stoyan, W.S. Kendall, and J. Mecke. Stochastic Geometry and its Applications. Wiley, 2008. N. Strigul, D. Pristinski, D. Purves, J. Dusho, and S. W. Pacala. Scaling from trees to forests: tractable macroscopic equations for forest dynamics. Ecological Monographs, 78: 523{545, 2008. M. E. Tryby, M. Propato, and S. Ranjithan. Monitoring design for source identication in water distribution systems. Journal of Water Resources Planning and Management, 136(6): 637{646, 2010. D. L. Urban, G. B. Bonan, T. M. Smith, and H. H. Shugart. Spatial applications of gap spatial applications of gap models. Forest Ecology and Management, 42(1-2), 1991. J.P. Watson, H.J. Greenberg, and W.E. Hart. A multiple-objective analysis of sensor place- ment optimization in water networks. In Critical Transitions In Water And Environmental Resources Management, 2004. 109 J.P. Watson, W.E. Hart, and R. Murray. Formulation and optimization of robust sensor place- ment problems for contaminant warning systems. In Water Distribution Systems Analysis Symposium, 2006. Martin Weickgenannt, Zoran Kapelan, Mirjam Blokker, and Dragan Savic. Risk-based sen- sor placement for contaminant detection in water distribution systems. Journal of Water Resources Planning and Management, 136(6):629{636, 2010. E Weinan. Analysis of the heterogeneous multiscale method for ordinary dierential equa- tions. Communications in Mathematical Sciences, 1(3):423{436, 2011. E Weinan and B Engquist. Multiscale modeling and computation. Notices of the AMS, 50 (9):1062{1070, 2003. E Weinan, B Engquist, X Li, W Ren, and E Vanden-Eijnden. Heterogeneous multiscale methods: a review. Commun Comput Phys, 2(3):367{450, 2007. H. Weiping, W. Xiuxin, and L. Yaling. A novel pitch period detection algorithm bases on hht with application to normal and pathological voice. Proceedings. 27th Annual International Conference of the Engineering in Medicine and Biology Society, pages 4541{4544, 2006. J. Wu and J. L. David. A spatially explicit hierarchical approach to modeling complex ecological systems: theory and applications. Ecological Modelling, 153(1-2), 2002. J. Wu, K. B. Jones, H. Li, and O.L. Loucks. Scaling And Uncertainty Analysis in Ecology: Methods And Applications. Springer, 2006. Y Xing, AJ Majda Majda, and W Grabowski. New ecient sparse space{time algorithms for superparameterization on mesoscales. journals.ametsoc.org, Jan 2010. J. Yang, H. S. He, and E. J. Gustason. A hierarchical re frequency model to simulate temporal patterns of re regimes in landis. Ecological Modelling, 180(1), 2004a. Z Yang, D Qi, and L Yang. Signal period analysis based on hilbert-huang transform and its application to texture analysis. Proceedings. Third International Conference on Image and Graphics, pages 430{433, 2004b. D Yinfeng, J Shuyan, and L Yingmin. An enhanced hilbert-huang transform method for analysis of earthquake ground motions. In 4th International Conference on Earthquake En- gineering Taipei, Taiwan. ICEE, Jan 2006. D Yinfeng, L Yingmin, X Mingkui, and L Ming. Analysis of earthquake ground motions using an improved hilbert-huang transform. Soil Dynamics and Earthquake Engineering, 28 (1):7{19, 2008. 110
Abstract (if available)
Abstract
This research reflects on both particular cases of ecologies: sensor networks in urban water distribution systems and forest dynamics under changing disturbance regime, and the elaboration of exploration tools to investigate and assess those complex ecologies. The predictive models that emerged from this work rely on sophisticated computer simulations designed based on stochastic and multiscale principles, and contribute to characterizing and evaluating the knowledge and information that is available about the systems under investigation. This thesis is organized into three distinct projects, each of which confronts specific challenges encountered when modeling complex systems. The first project considers the monitoring of water distribution networks where we describe a stochastic parameterization and analysis of uncertainty for the design of single-stage as well as two-stage sensor networks aimed at maximizing the probability of detection of accidents and intrusions in water distribution systems. Next, this study explores the ecological and evolutionary impacts of different disturbance regimes (generated following a stochastic Poisson process) on forests using the framework of a spatially explicit and individual-based forest model designed around four functional traits of trees. The final project investigates a multiscale characterization of forest dynamics using Monte-Carlo simulations of the fine scale dynamics to synthesize a coarse-grain stochastic model describing the dynamics of the system on larger spatial scales. In addition to modeling aspects, all three projects yield adequate computational performance, thereby assessing a recurrent challenge associated with the computational feasibility and performance of relevant numerical algorithms.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Model, identification & analysis of complex stochastic systems: applications in stochastic partial differential equations and multiscale mechanics
PDF
Algorithms for stochastic Galerkin projections: solvers, basis adaptation and multiscale modeling and reduction
PDF
Modeling of multiscale continuum-atomistic systems using homogenization theory
PDF
Efficient inverse analysis with dynamic and stochastic reductions for large-scale models of multi-component systems
PDF
Design optimization under uncertainty for rotor blades of horizontal axis wind turbines
PDF
Computationally efficient design of optimal strategies for passive and semiactive damping devices in smart structures
PDF
Studies into vibration-signature-based methods for system identification, damage detection and health monitoring of civil infrastructures
PDF
Stochastic data assimilation with application to multi-phase flow and health monitoring problems
PDF
Studies into computational intelligence approaches for the identification of complex nonlinear systems
PDF
Stochastic models: simulation and heavy traffic analysis
PDF
Hybrid physics-based and data-driven computational approaches for multi-scale hydrologic systems under heterogeneity
PDF
Optimal clipped linear strategies for controllable damping
PDF
Stochastic dynamic power and thermal management techniques for multicore systems
PDF
Active state tracking in heterogeneous sensor networks
PDF
A stochastic employment problem
PDF
Experimental and analytical studies of infrastructure systems condition assessment using different sensing modality
PDF
Computational validation of stochastic programming models and applications
PDF
A polynomial chaos formalism for uncertainty budget assessment
PDF
Understanding dynamics of cyber-physical systems: mathematical models, control algorithms and hardware incarnations
PDF
Efficient stochastic simulations of hydrogeological systems: from model complexity to data assimilation
Asset Metadata
Creator
Comboul, Maud
(author)
Core Title
Stochastic and multiscale models for urban and natural ecology
School
Viterbi School of Engineering
Degree
Doctor of Philosophy
Degree Program
Civil Engineering (Environmental Engineering)
Publication Date
07/27/2012
Defense Date
05/09/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
forest dynamics,greedy algorithm,Markov dynamics,Monte Carlo simulations agent-based model,multiscale dynamics,OAI-PMH Harvest,sensor network,stochastic optimization
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ghanem, Roger G. (
committee chair
), Johnson, Erik A. (
committee member
), Masri, Sami F. (
committee member
), Nakano, Aiichiro (
committee member
), Newton, Paul K. (
committee member
)
Creator Email
comboul@usc.edu,maudcomboul@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-72481
Unique identifier
UC11290196
Identifier
usctheses-c3-72481 (legacy record id)
Legacy Identifier
etd-ComboulMau-1045.pdf
Dmrecord
72481
Document Type
Dissertation
Rights
Comboul, Maud
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
forest dynamics
greedy algorithm
Markov dynamics
Monte Carlo simulations agent-based model
multiscale dynamics
sensor network
stochastic optimization